MIT just made every AI company's billion dollar bet look embarrassing.
They solved AI memory. Not by building a bigger brain. By teaching it how to read.
The paper dropped on December 31, 2025. Three MIT CSAIL researchers. One idea so obvious it hurts. And a result that makes five years of context window arms racing look like the wrong war entirely.
Here is the problem nobody solved.
Every AI model on the planet has a hard ceiling. A context window. The maximum amount of text it can hold in working memory at once. Cross that line and something ugly happens — something researchers have a clinical name for.
Context rot.
The more you pack into an AI's context, the worse it performs on everything already inside it. Facts blur. Information buried in the middle vanishes. The model does not become more capable as you feed it more. It becomes more confused. You give it your entire codebase and it forgets what it read three files ago. You hand it a 500-page legal document and it loses the clause from page 12 by the time it reaches page 400.
So the industry built a workaround. RAG. Retrieval Augmented Generation. Chop the document into chunks. Store them in a database. Retrieve the relevant ones when needed.
It was always a compromise dressed up as a solution.
The retriever guesses which chunks matter before the AI has read anything. If it guesses wrong — and it does, constantly — the AI never sees the information it needed. The act of chunking destroys every relationship between distant paragraphs. The full picture gets shredded into fragments that the AI then tries to reassemble blindfolded.
Two bad options. One broken industry. Three MIT researchers and a deadline of December 31st.
Here is what they built.
Stop putting the document in the AI's memory at all.
That is the entire idea. That is the breakthrough. Store the document as a Python variable outside the AI's context window entirely. Tell the AI the variable exists and how big it is. Then get out of the way.
When you ask a question, the AI does not try to remember anything. It behaves like a human expert dropped into a library with a computer. It writes code. It searches the document with regular expressions. It slices to the exact section it needs. It scans the structure. It navigates. It finds precisely what is relevant and pulls only that into its active window.
Then it does something that makes this recursive.
When the AI finds relevant material, it spawns smaller sub-AI instances to read and analyze those sections in parallel. Each one focused. Each one fast. Each one reporting back. The root AI synthesizes everything and produces an answer.
No summarization. No deletion. No information loss. No decay. Every byte of the original document remains intact, accessible, and queryable for as long as you need it.
Now here are the numbers.
Standard frontier models on the hardest long-context reasoning benchmarks: scores near zero. Complete collapse. GPT-5 on a benchmark requiring it to track complex code history beyond 75,000 tokens — could not solve even 10% of problems.
RLMs on the same benchmarks: solved them. Dramatically. Double-digit percentage gains over every alternative approach. Successfully handling inputs up to 10 million tokens — 100 times beyond a model's native context window.
Cost per query: comparable to or cheaper than standard massive context calls.
Read that again. One hundred times the context. Better answers. Same price.
The timeline of the arms race makes this sting harder. GPT-3 in 2020: 4,000 tokens. GPT-4: 32,000. Claude 3: 200,000. Gemini: 1 million. Gemini 2: 2 million. Every generation, every company, billions of dollars spent, all betting on the same assumption.
More context equals better performance.
MIT just proved that assumption was wrong the entire time.
Not slightly wrong. Fundamentally wrong. The entire premise of the last five years of context window research — that the solution to AI memory was a bigger window — was the wrong answer to the wrong question.
The right question was never how much can you force an AI to hold in its head.
It was whether you could teach an AI to know where to look.
A human expert handed a 10,000-page archive does not read all 10,000 pages before answering your question. They navigate. They search. They find the relevant section, read it deeply, and synthesize the answer.
RLMs are the first AI architecture that works the same way.
The code is open source. On GitHub right now. Free. No license fees. No API costs. Drop it in as a replacement for your existing LLM API calls and your application does not even notice the difference — except that it suddenly works on inputs it used to fail on entirely.
Prime Intellect — one of the leading AI research labs in the space — has already called RLMs a major research focus and described what comes next: teaching models to manage their own context through reinforcement learning, enabling agents to solve tasks spanning not hours, but weeks and months.
The context window wars are over.
MIT won them by walking away from the battlefield.
Source: Zhang, Kraska, Khattab · MIT CSAIL · arXiv:2512.24601
Paper: https://t.co/ngovOSNrCQ
GitHub: https://t.co/gT0ootCNoa
Fractal is interesting because it is not trying to be your main coding agent.
It is trying to become the agent your coding agent calls when context gets ugly.
Most agent workflows still look like:
stuff everything into context
ask model to reason
hope it does not drown
Fractal takes a different path.
It is built around Recursive Language Models: the model treats large context as something to inspect, search, slice, run code over, delegate into sub-tasks, and fold back up.
That matters for tasks where normal agents start failing:
read the whole repo
audit 300 files
trace a dependency chain
compare long docs
investigate messy logs
understand deep project context
The clever part is not “bigger context window.”
The clever part is turning long context into a computation problem.
Claude Code, Cursor or Codex can handle the normal coding loop.
But when the task becomes too broad or too nested, you want a specialist runtime that can recursively inspect the world instead of pretending it remembers everything.
Caveat: still early, costs can vary, and human judgment does not disappear.
But I like the direction.
The future of agents is probably not one giant assistant doing everything.
It is a hierarchy of agents, where the main agent knows when to delegate.
Stop asking the model to remember the whole world.
Let it learn how to inspect it.
Introducing Fractal — the recursive language model CLI agent.
Fractal is a terminal agent that can answer questions, plan tasks, read and write files, run commands, inspect codebases, and work through problems turn by turn.
But at its core, Fractal is not just another CLI harness around an LLM.
It is powered by predict-rlm, our self-harnessed Recursive Language Model runtime. In practice, that means Fractal can recursively reason through context, interact with its own REPL environment, and take action through code as it works.
We built it to make RLMs easy to try and actually useful on real tasks.
If you've been hearing about RLMs but could never come around to using one, this is the best way to do so.
And if you already know what RLMs are capable of, Fractal is the easiest way to use one on your own tasks.
Install it today:
→ curl -LsSf https://t.co/j0Ye0JSpR3 | sh
Drop us a ⭐️:
→ https://t.co/NAbNAseVsT
Or check out the website:
→ https://t.co/w6XzXx8tN3
@vkhosla@Stanford@Google@sundarpichai In my limited time on earth and the few readings I've done, I've come to believe that old men berating the educated youth are generally wrong.
Still, I'm curious: who are the 3B people. And why and how will they benefit from AI.
@Haggis@GabLesperance it's not. I have a nightly agent checking all traces from previous day. suggesting PRs before I even wake up and discussing system changes.
Management and oversight is still required.
RLMs are so resilient.
Multiple times I've run into bugs in our setups. What's interesting is that those bugs only became apparent after careful trace reviews, because the RLM actually found a way forward despite some broken state.
Truly mind-boggling.
RLMs are so resilient.
Multiple times I've run into bugs in our setups. What's interesting is that those bugs only became apparent after careful trace reviews, because the RLM actually found a way forward despite some broken state.
Truly mind-boggling.
@GabLesperance casually beating the AppWorld leaderboard before optimizing anything, then pushing to 0.911 SGC with RLM-GEPA.
The thesis in one line: put the intelligence in the model, not the harness. Less scaffolding, better results.
Go read it 👇
@BrennanWoodruff@willchen500 You operate under the assumption that building AI is hard.
It's not. And way less hard than putting it into domain-specific production rails, working *reliably* alongside humans.
They don't need their own models. They need their own scaffolds. They need to rethink their SOPs.
@CodeClashIG@claudeai Every technology that came did that.
When writing came, oral traditions mourned the loss of a way to think and remember the world.
We'll eventually probably have more brain olympiads and stuff like that maybe :)
@BoringBiz_ It seems that we live in a world where demand for things is actually limited. Or, at least, lagging behind the unlimited supply that now exists.
Having fun with Recursive LMs (Predict-RLM)
I created an agent to analyze long contracts, page by page, with contextually highlighted text and negociation tool.
https://t.co/EcSSkPEgVY
a bit later than planned, but as promised: we’ve now released the data, traces, plots, and code behind our SpreadsheetBench / RLM-GEPA experiment.
everything is in the predict-rlm repo now.
https://t.co/ebTDlzCsZj (see examples/spreadbench)