Solo dev reverse-engineered Google's billion-dollar algorithm in 7 days
Google published the paper that crashed memory stocks worldwide. Then shipped zero code.
Tom Turney read the math, opened his terminal, and built the whole thing with Claude - then made it faster than Google promised.
Day 1-3: Core algorithms, 141 tests, Python prototype
Day 3-5: C port into llama.cpp, Metal GPU kernels
Day 5-7: Speed optimization from 739 to 2747 tok/s
That's a 3.7x speedup through pure engineering:
> fp32 → fp16 WHT
> half4 vectorized butterfly ops
> graph-side rotation
> block-32 storage layout
Then he added his own research on top:
> Sparse V: skip 90% of value decompressions at long context
> Asymmetric K/V: keep keys precise, compress values harder
> Temporal decay: old tokens get lower precision automatically
Result: 35B model running on a MacBook with 4.6x compressed cache.
613 GitHub stars in a week. Google still hasn't released their own code.
Harness engineering is as important as model capability scaling.
AI Agents are 50% a harness story.
"Natural-Language Agent Harnesses" proposes moving harness logic out of code and into the native language of LLMs:
Natural Language.
Turns the agent harness into a portable artifact that you can experiment with, improve and execute in a shared runtime.
Agent harnesses will mature into a first class citizen of the AI Agent ecosystem.
We will have people who are Agent Harness engineers and experts.