I just read a paper that completely broke my brain.
It describes a system that solved an AI task with over 1,000,000 sequential steps... with ZERO errors.
Using AI models that are known to be flaky and make mistakes.
How is that even possible? 🤯
We all know LLMs have an error rate. Even 99.9% accuracy is a death sentence for long tasks.
Imagine you need 1,000 correct steps in a row. With a 99.9% success rate per step, your chance of finishing the whole thing is only ~36%.
At a million steps? Forget it. It's statistically impossible.
So for years, the race has been to build bigger, "smarter" models to get that per-step error rate closer to zero. We're trying to build a perfect genius.
But this paper ("Solving a Million-Step LLM Task with Zero Errors") does the complete opposite. It's a total paradigm shift.
Here's the "holy shit" moment:
Stop trying to make the AI perfect. Instead, build a system that's immune to its imperfections.
How?
Smash the problem into the tiniest possible pieces. (They call it Maximal Agentic Decomposition).
Have a team of simple, cheap AIs vote on the answer for each tiny piece.
It's less like hiring one world-class chef and praying they don't have an off day, and more like designing the McDonald's kitchen.
The system guarantees the burger is the same every time, even if any individual worker could mess up.
The reliability comes from the process, not the person.
They tested this on the Towers of Hanoi puzzle—a classic benchmark where AIs fail spectacularly as the task gets longer.
They set it up for 20 disks. That requires 1,048,575 perfect moves in a row.
(seriously, over a million steps)
A single AI trying this would be a comedy of errors.
But their system of "micro-agents" voting on every single move... nailed it. Flawlessly.
And the plot twist? The most expensive, "state-of-the-art" models weren't even the best for the job. A smaller, cheaper model (gpt-4.1-mini) was more cost-effective because the tasks were so simple.
This is a huge deal for AI safety, too.
A single, god-like AI is a black box. It's unpredictable.
But a system of a million simple agents? You can inspect it. You can audit each step. The agents have no grand "worldview"—their entire existence is to solve one tiny puzzle and then disappear. It's controllable.
So next time you're building something with an LLM, maybe stop asking "how can I prompt the model to be smarter?"
And start asking: "How can I design a system where it's okay for the model to be dumb?"
The real power isn't just in the model. It's in the architecture you build around it.
This isn't just about AI. It's a fundamental lesson in engineering and problem-solving.
You don't always need perfect components to build a perfect machine. You just need a damn good design.
...which makes you wonder what else we're trying to solve by chasing individual perfection instead of building better systems.
I just read a paper that completely broke my brain.
It describes a system that solved an AI task with over 1,000,000 sequential steps... with ZERO errors.
Using AI models that are known to be flaky and make mistakes.
How is that even possible? 🤯
We all know LLMs have an error rate. Even 99.9% accuracy is a death sentence for long tasks.
Imagine you need 1,000 correct steps in a row. With a 99.9% success rate per step, your chance of finishing the whole thing is only ~36%.
At a million steps? Forget it. It's statistically impossible.
So for years, the race has been to build bigger, "smarter" models to get that per-step error rate closer to zero. We're trying to build a perfect genius.
But this paper ("Solving a Million-Step LLM Task with Zero Errors") does the complete opposite. It's a total paradigm shift.
Here's the "holy shit" moment:
Stop trying to make the AI perfect. Instead, build a system that's immune to its imperfections.
How?
Smash the problem into the tiniest possible pieces. (They call it Maximal Agentic Decomposition).
Have a team of simple, cheap AIs vote on the answer for each tiny piece.
It's less like hiring one world-class chef and praying they don't have an off day, and more like designing the McDonald's kitchen.
The system guarantees the burger is the same every time, even if any individual worker could mess up.
The reliability comes from the process, not the person.
They tested this on the Towers of Hanoi puzzle—a classic benchmark where AIs fail spectacularly as the task gets longer.
They set it up for 20 disks. That requires 1,048,575 perfect moves in a row.
(seriously, over a million steps)
A single AI trying this would be a comedy of errors.
But their system of "micro-agents" voting on every single move... nailed it. Flawlessly.
And the plot twist? The most expensive, "state-of-the-art" models weren't even the best for the job. A smaller, cheaper model (gpt-4.1-mini) was more cost-effective because the tasks were so simple.
This is a huge deal for AI safety, too.
A single, god-like AI is a black box. It's unpredictable.
But a system of a million simple agents? You can inspect it. You can audit each step. The agents have no grand "worldview"—their entire existence is to solve one tiny puzzle and then disappear. It's controllable.
So next time you're building something with an LLM, maybe stop asking "how can I prompt the model to be smarter?"
And start asking: "How can I design a system where it's okay for the model to be dumb?"
The real power isn't just in the model. It's in the architecture you build around it.
This isn't just about AI. It's a fundamental lesson in engineering and problem-solving.
You don't always need perfect components to build a perfect machine. You just need a damn good design.
...which makes you wonder what else we're trying to solve by chasing individual perfection instead of building better systems.
Introducing https://t.co/jRUWOa5aix, the world's first platform that lets businesses lease audience data from Instagram & Facebook creators.
With Flawlesss, businesses can target with precision from day one, and creators can make more money.
*Meta ads will never be the same
Introducing https://t.co/jRUWOa5aix, the world's first platform that lets businesses lease audience data from Instagram & Facebook creators.
With Flawlesss, businesses can target with precision from day one, and creators can make more money.
*Meta ads will never be the same
knowing that everything you build will eventually dissolve, that the universe is fundamentally indifferent and then building anyway is the coolest act ever
OpenAI o1 is now out of preview in ChatGPT.
What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing.
o1 now also supports image uploads, allowing it to apply reasoning to visuals for more detailed & useful responses.
@drgurner A concept I thought of not completely my own.
@19keys_ & @mrgrateful pondered if Curriculum and the school systems could be use to create specialized education for each child specific to their learning patterns and personality type.
My full conversation with Mark Zuckerberg on the breaking Meta AI announcements, fighting in the UFC, metaverse, Ray-Ban Metas, and future technologies.
But what was really touching was his thoughts on legacy and fatherhood.
Timestamps:
00:00 Intro
00:37 Meta AI announcements
02:58 How brands & creators can use Meta AI
04:32 How Mark uses AI
05:57 The most excitement Mark has had as a builder?
07:36 The future of the Ray-Ban Meta smart glasses
12:51 10 years of Reality Labs: what's next?
16:49 Jensen Huang jersey swap
18:11 How does Mark want to be remembered?
19:23 Is it Mark actually posting on social media?
20:55 Mark's choice for best MMA fighter
21:38 Mark fighting in the UFC?
22:33 Mark's fatherhood advice
@rowancheung Adobe is doing what they do best... No one expected that. Sora, Runway, and Pika part of one software that's massive. A major highlight from the video was that Sora can generate multiple versions of a video from one prompt in a single API call