Claude Code fully dissected!
Researchers from UCL reverse-engineered the leaked Claude source. What they found changes how you should think about agent design.
Only 1.6% of the codebase is AI decision logic.
The other 98.4% is operational infrastructure. Permission gates, tool routing, context compaction, recovery logic, session persistence. The model reasons. The harness does everything else.
This is the opposite of what most agent frameworks do today.
LangGraph routes model outputs through explicit state machines. Devin bolts heavy planners onto operational scaffolding. Claude Code gives the model maximum decision latitude inside a rich deterministic harness, and invests all its engineering effort in that harness.
The core loop is a simple while-true. Call model, run tools, repeat.
But the systems around that loop are where the real design lives:
A permission system with 7 modes and an ML classifier. Users approve 93% of prompts anyway, so the architecture compensates with automated layers instead of adding more warnings.
A 5-layer context compaction pipeline. Each layer runs only when cheaper ones fail. Budget reduction, snip, microcompact, context collapse, auto-compact.
Four extension mechanisms ordered by context cost. Hooks (zero), skills (low), plugins (medium), MCP (high). Each answers a different integration problem.
Subagents return only summary text to the parent. Their full transcripts live in sidechain files. Agent teams still cost roughly 7x the tokens of a standard session.
Resume does not restore session-scoped permissions. Trust is re-established every session. That friction is the point.
The bet behind all of this is simple. As frontier models converge on raw coding ability, the quality of the harness becomes the differentiator, not the model.
Paper: Dive into Claude Code (arXiv:2604.14228)
We've shared an article on Agent Harness and what every big company is building.
Read it below.
Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh
We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs.
Kimi 2.7 is amazingly good. More below.
How do you give a code LLM knowledge of an entire repository without paying for it at every single query?
We introduce Code2LoRA: a hypernetwork that turns a repository into its own LoRA adapter. Repo knowledge baked into weights → zero inference-time token overhead.
Code2LoRA seems an incredibly interesting idea.
Qwen2.5-Coder-1.5B is not the most powerful LLM around, but it's enough to validate the concept.
Instead of stuffing repository context into the prompt at every query, distill it into a LoRA adapter. One forward pass over the repo snapshot, one adapter, zero extra inference tokens.
For evolving codebases, a single layer GRU tracks commit history on top of that snapshot. Each git diff updates the hidden state in <10ms.
You get a fresh adapter at every commit without need for a full retraining.
Great job Liliana! I bet this will lead to something cool in the near future 🙌
Train your own LLM from scratch!
A step-by-step repo that walks you through building and training a transformer model from scratch using PyTorch. From downloading training data all the way to generating text.
The architecture is built from the ground up following the original "Attention is All You Need" paper. MLP, single head attention, multi-head attention, transformer blocks, and the full transformer model - all coded and explained with detailed diagrams at each step.
Training data comes from The Pile - a diverse 825GB open-source dataset covering books, articles, code, websites, and more. The repo includes scripts to download it, preprocess and tokenize it using tiktoken, store it in HDF5 format, and feed it into training batches.
You can train a 13M parameter model on a single Colab T4 GPU. At 13M parameters the model starts generating proper grammar and coherent short sentences. For billion-parameter training you need at least an A100 or RTX 4090. The repo includes a full GPU compatibility table so you know exactly what's possible on your hardware.
Includes a complete SFT and RLHF guide as a separate notebook for taking your trained model further.
Key capabilities:
• End-to-end pipeline: data download → preprocessing → training → text generation
• Full transformer implementation from scratch with PyTorch
• Trains models from 13M to 2B+ parameters on a single GPU
• Training data from The Pile (825GB, 22 diverse datasets)
• Tokenization via tiktoken (r50k_base)
• SFT and RLHF guide included
100% open source.
I've shared the link in the replies!
Here's a teaser of our Mac-1 model.
> 6.6B model
> runs locally (on any Mac)
> requires 7GB RAM (12GB ideal)
> can use 487 MacOS native tools
> perform multi-tool chained tasks
> reasoning: ON
> output: ~65 tok/s
We built a robust application layer around the model to make UI/UX MacOS native. The "model-focused" SaaS era is here.
Stay tuned for more.
Shoutout to the open source projects behind this:
• Serve-sim powers the streaming simulator by @Baconbrix
https://t.co/Yx52DuSGcZ
• SnapshotPreviews extracts SwiftUI previews by @sentry
https://t.co/EaeTrNksfZ
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
INSTEAD OF WATCHING AN HOUR OF NETFLIX TONIGHT.
This 1 hour Stanford lecture by Joel Peterson will teach you more about negotiation and getting what you want than most people learn in years.
Bookmark it and give it an hour, no matter what.
Fine-tuning in 2026 has never been easier
You can make any open-source model 10x more powerful
And thanks to Unsloth Studio, creating custom datasets takes just a few mins,
Here is the full course:
the anthropic claude for finance lecture is the best free hour in quant AI right now.
bookmark & watch today. It's the most valuable 1 hour in quant AI right now. Then read article below.