Specializing in product development, growth strategies, and AI workflow automation, delivering tailored solutions to bring your vision to life - @grahamfleming
Founders, scaling doesn't have to mean chaos.
Athos integrates AI driven tools to streamline your ops, predict growth hurdles, and automate the grunt work so you can focus on innovation.
Don't let the complexities overwhelm your ability, let's turn potential into progress.
Athos isn’t just another dev shop. It’s an AI-native engineering studio built for founders and scaling teams.
From strategy to full-stack builds, we help you move faster and smarter.
Building a startup is a sprint. Scaling it is a marathon.
Athos helps you do both with AI-powered engineering, strategic advising, and hands-on MVP development.
Start smart. Scale fast.
The @karpathy interview
0:00:00 – AGI is still a decade away
0:30:33 – LLM cognitive deficits
0:40:53 – RL is terrible
0:50:26 – How do humans learn?
1:07:13 – AGI will blend into 2% GDP growth
1:18:24 – ASI
1:33:38 – Evolution of intelligence & culture
1:43:43 - Why self driving took so long
1:57:08 - Future of education
Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!
18 months ago, @karpathy set a challenge: "Can you take my 2h13m tokenizer video and translate [into] a book chapter".
We've done it! It includes prose, code & key images. It's a great way to learn this key piece of how LLMs work.
https://t.co/aSgsZz0VxO
nanochat d32, i.e. the depth 32 version that I specced for $1000, up from $100 has finished training after ~33 hours, and looks good. All the metrics go up quite a bit across pretraining, SFT and RL. CORE score of 0.31 is now well above GPT-2 at ~0.26. GSM8K went ~8% -> ~20%, etc. So that's encouraging.
The model is pretty fun to talk to, but judging from some early interactions I think people have a little bit too much expectation for these micro models. There is a reason that frontier LLM labs raise billions to train their models. nanochat models cost $100 - $1000 to train from scratch. The $100 nanochat is 1/1000th the size of GPT-3 in parameters, which came out 5 years ago. So I urge some perspective. Talking to micro models you have to imagine you're talking to a kindergarten child. They say cute things, wrong things, they are a bit confused, a bit naive, sometimes a little non-sensical, they hallucinate a ton (but it's amusing), etc.
Full detail/report on this run is here:
https://t.co/nWbfKOZLIg
And I pushed the new script run1000 sh to the nanochat repo if anyone would like to reproduce. Totally understand if you'd like to spend $1000 on something else :D
If you like, I am currently hosting the model so you can talk to it on a webchat as you'd talk to ChatGPT. I'm not going to post the URL here because I'm afraid it will get crushed. You'll have to look for it if you care enough. I'm also attaching a few funny conversations I had with the model earlier into the image, just to give a sense.
Next up, I am going to do one pass of tuning and optimizing the training throughput, then maybe return back to scaling and maybe training the next tier of a bigger model.
🚀 Founders: Ready to scale smarter, not harder?
Athos Engineering blends AI strategy with hands-on development to help you build, grow, and lead with confidence.
From MVPs to team leadership, we’re your engineering co-pilot.
Document-to-Markdown converter for LLM pipelines – MarkItDown from @Microsoft
This Python tool converts dozens of file types to clean Markdown, keeping headings, lists, tables, links, and metadata.
Supports:
- PDF, Word, Excel, PowerPoint
- HTML, CSV, JSON, XML
- Images (OCR + EXIF), audio (transcription + metadata)
- ZIP files, YouTube URLs, EPubs, and more
As Markdown is LLMs' "native language," it's perfect for preprocessing documents before feeding them into models.
glad to see "deep agents" gaining more steam! great writeup by Philipp
code for deepagents on top of langgraph: https://t.co/lhz629tffW
LangChain Academy course on deep agents: https://t.co/dY2Dhe9hze
🤖 Advanced Stock Research Agent
A multi-agent system built with LangChain's DeepAgents that delivers professional stock analysis by combining real-time data, technical indicators, and financial examination.
Check it out here 🔍
https://t.co/d8B5TJIASx
Excited to release new repo: nanochat!
(it's among the most unhinged I've written).
Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI.
It weighs ~8,000 lines of imo quite clean code to:
- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with "GRPO"
- Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI.
- Write a single markdown report card, summarizing and gamifying the whole thing.
Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc.
My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.
Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
You don’t need a full dev team to launch your vision.
You need Athos.
✅ MVP development
✅ Architecture setup
✅ AI-enhanced workflows
Let’s build something brilliant together.
We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own.
Let's go open-source AI!
How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are:
- small (each line of code costs energy)
- modular (organized into groups of swappable operons)
- self-contained (easily "copy paste-able" via horizontal gene transfer)
If chunks of code are small, modular, self-contained and trivial to copy-and-paste, the community can thrive via horizontal gene transfer. For any function (gene) or class (operon) that you write: can you imagine someone going "yoink" without knowing the rest of your code or having to import anything new, to gain a benefit? Could your code be a trending GitHub gist?
This coding style guide has allowed bacteria to colonize every ecological nook from cold to hot to acidic or alkaline in the depths of the Earth and the vacuum of space, along with an insane diversity of carbon anabolism, energy metabolism, etc. It excels at rapid prototyping but... it can't build complex life. By comparison, the eukaryotic genome is a significantly larger, more complex, organized and coupled monorepo. Significantly less inventive but necessary for complex life - for building entire organs and coordinating their activity. With our advantage of intelligent design, it should possible to take advantage of both. Build a eukaryotic monorepo backbone if you have to, but maximize bacterial DNA.