This is the biggest news from today’s GPT-5.2 launch.
Forget the benchmark charts OpenAI showed. Forget the 100% AIME score and the SWE-Bench Pro numbers. The real story is buried in a single data point from ARC Prize: 90.5% accuracy at $11.64 per task.
A year ago, hitting 88% on ARC-AGI-1 cost an estimated $4,500 per task. Today, 90.5% costs $11.64. That’s 390X cheaper in 12 months.
Look at that leaderboard chart. The efficiency frontier is getting redrawn every few weeks. GPT-5.2 Pro, Grok 4, Gemini 3 Deep Think, Claude Opus 4.5, all stacking on top of each other in a diagonal line from bottom-left to top-right, each one obsoleting the economics of what came before it.
Here’s what most people don’t understand about this benchmark.
François Chollet designed ARC-AGI in 2019 specifically to resist brute-force scaling. The whole thesis was that LLMs just pattern-match training data and would fail catastrophically on novel abstract reasoning. Each puzzle is unique, never seen online, requiring genuine generalization from minimal examples. Humans solve 95% of them easily. For years, the best AI systems couldn’t crack 5%.
The 2020 Kaggle competition topped out at 20%. By 2023, still only 33%. GPT-3 scored literally 0% via direct prompting. The AI research community largely accepted ARC-AGI as proof that scaling alone wouldn’t reach general intelligence. Chollet himself said reaching human-level would “take many years.”
Then December 2024 happened. OpenAI’s o3-preview hit 87.5% in high-compute mode. First time any AI system crossed the human threshold of 85%. The model needed 1,024 attempts per task, writing roughly 137 pages of reasoning per attempt. Cost estimates ranged from $3,000 to $30,000 per task.
Eleven months later, GPT-5.2 Pro hits 90.5% at $11.64 per task.
The math on that cost collapse tells you everything. At $30,000 per task, you’d need to pay a human $6,000/hour to match the economics. At $11.64, a Mechanical Turk worker at $5/task is now more expensive than frontier AI reasoning. We crossed the human-cost parity line sometime in the last few months and most people missed it.
Now zoom out to the competitive dynamics.
Three weeks ago, Google dropped Gemini 3. Topped the LMArena leaderboard at 1501 Elo. Set records on Humanity’s Last Exam. Sam Altman publicly praised it. OpenAI declared “code red” internally, shelved projects like ad integrations, and fast-tracked GPT-5.2’s release from later this month to today.
This is the first model launch in OpenAI’s history that was explicitly a response to a competitor. The Verge reported employees asked to delay the release for more polish. Leadership overruled them. The directive was to reclaim the performance lead now.
And on ARC-AGI, they did. GPT-5.2 Pro at 90.5% edges out everything else on the board. But the real competition isn’t on accuracy anymore. Look at the cost-per-task column. The battle has shifted from “who can solve it” to “who can solve it cheaply.”
The efficiency gains aren’t slowing down. They’re compounding. Every major lab is now competing on the same benchmark, which means the collective R&D spend attacking this problem is in the billions. The 2025 ARC Prize Grand Prize ($700,000 for 85% on the private eval with efficiency constraints) is almost certainly getting claimed.
What happens after ARC-AGI-1 falls completely?
Chollet already released ARC-AGI-2 in March 2025, specifically designed to be harder for reasoning systems. Humans still hit nearly 100%. Current frontier models manage 10-45%. The gap between human and AI performance on even the harder benchmark is now a cost optimization problem, not a fundamental capability barrier.
If you’re building products in 2025 and assuming AI reasoning is expensive, you’re building for a world that no longer exists.
The benchmark that was supposed to prove AI couldn’t generalize just became another line item on a pricing page. 390X efficiency improvement in one year.
Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask:
"What do you think about xyz"?
There is no "you". Next time try:
"What would be a good group of people to explore xyz? What would they say?"
The LLM can channel/simulate many perspectives but it hasn't "thought about" xyz for a while and over time and formed its own opinions in the way we're used to. If you force it via the use of "you", it will give you something by adopting a personality embedding vector implied by the statistics of its finetuning data and then simulate that. It's fine to do, but there is a lot less mystique to it than I find people naively attribute to "asking an AI".
I would like to clarify a few things.
First, the obvious one: we do not have or want government guarantees for OpenAI datacenters. We believe that governments should not pick winners or losers, and that taxpayers should not bail out companies that make bad business decisions or otherwise lose in the market. If one company fails, other companies will do good work.
What we do think might make sense is governments building (and owning) their own AI infrastructure, but then the upside of that should flow to the government as well. We can imagine a world where governments decide to offtake a lot of computing power and get to decide how to use it, and it may make sense to provide lower cost of capital to do so. Building a strategic national reserve of computing power makes a lot of sense. But this should be for the government’s benefit, not the benefit of private companies.
The one area where we have discussed loan guarantees is as part of supporting the buildout of semiconductor fabs in the US, where we and other companies have responded to the government’s call and where we would be happy to help (though we did not formally apply). The basic idea there has been ensuring that the sourcing of the chip supply chain is as American as possible in order to bring jobs and industrialization back to the US, and to enhance the strategic position of the US with an independent supply chain, for the benefit of all American companies. This is of course different from governments guaranteeing private-benefit datacenter buildouts.
There are at least 3 “questions behind the question” here that are understandably causing concern.
First, “How is OpenAI going to pay for all this infrastructure it is signing up for?” We expect to end this year above $20 billion in annualized revenue run rate and grow to hundreds of billion by 2030. We are looking at commitments of about $1.4 trillion over the next 8 years. Obviously this requires continued revenue growth, and each doubling is a lot of work! But we are feeling good about our prospects there; we are quite excited about our upcoming enterprise offering for example, and there are categories like new consumer devices and robotics that we also expect to be very significant. But there are also new categories we have a hard time putting specifics on like AI that can do scientific discovery, which we will touch on later.
We are also looking at ways to more directly sell compute capacity to other companies (and people); we are pretty sure the world is going to need a lot of “AI cloud”, and we are excited to offer this. We may also raise more equity or debt capital in the future.
But everything we currently see suggests that the world is going to need a great deal more computing power than what we are already planning for.
Second, “Is OpenAI trying to become too big to fail, and should the government pick winners and losers?” Our answer on this is an unequivocal no. If we screw up and can’t fix it, we should fail, and other companies will continue on doing good work and servicing customers. That’s how capitalism works and the ecosystem and economy would be fine. We plan to be a wildly successful company, but if we get it wrong, that’s on us.
Our CFO talked about government financing yesterday, and then later clarified her point underscoring that she could have phrased things more clearly. As mentioned above, we think that the US government should have a national strategy for its own AI infrastructure.
Tyler Cowen asked me a few weeks ago about the federal government becoming the insurer of last resort for AI, in the sense of risks (like nuclear power) not about overbuild. I said “I do think the government ends up as the insurer of last resort, but I think I mean that in a different way than you mean that, and I don’t expect them to actually be writing the policies in the way that maybe they do for nuclear”. Again, this was in a totally different context than datacenter buildout, and not about bailing out a company. What we were talking about is something going catastrophically wrong—say, a rogue actor using an AI to coordinate a large-scale cyberattack that disrupts critical infrastructure—and how intentional misuse of AI could cause harm at a scale that only the government could deal with. I do not think the government should be writing insurance policies for AI companies.
Third, “Why do you need to spend so much now, instead of growing more slowly?”. We are trying to build the infrastructure for a future economy powered by AI, and given everything we see on the horizon in our research program, this is the time to invest to be really scaling up our technology. Massive infrastructure projects take quite awhile to build, so we have to start now.
Based on the trends we are seeing of how people are using AI and how much of it they would like to use, we believe the risk to OpenAI of not having enough computing power is more significant and more likely than the risk of having too much. Even today, we and others have to rate limit our products and not offer new features and models because we face such a severe compute constraint.
In a world where AI can make important scientific breakthroughs but at the cost of tremendous amounts of computing power, we want to be ready to meet that moment. And we no longer think it’s in the distant future. Our mission requires us to do what we can to not wait many more years to apply AI to hard problems, like contributing to curing deadly diseases, and to bring the benefits of AGI to people as soon as possible.
Also, we want a world of abundant and cheap AI. We expect massive demand for this technology, and for it to improve people’s lives in many ways.
It is a great privilege to get to be in the arena, and to have the conviction to take a run at building infrastructure at such scale for something so important. This is the bet we are making, and given our vantage point, we feel good about it. But we of course could be wrong, and the market—not the government—will deal with it if we are.
5/7 🎯 This expansion and @NEOIX_PLC EU branch supports our upcoming public listing on a leading European exchange, providing enhanced access to international capital markets and growth opportunities.
New from the Anthropic Economic Index: the first comprehensive analysis of how AI is used in every US state and country we serve.
We've produced a detailed report, and you can explore our data yourself on our new interactive website.
🧵 1/7 🚀 BIG NEWS: @NEOIX_PLC officially opens EU branch in Düsseldorf! Our new continental HQ will coordinate sustainable data center projects across Germany, Finland, Lithuania & Croatia 🇩🇪🇫🇮🇱🇹🇭🇷
NEOIX PLC announces Planning Phase for Public Listing on European Stock Exchange. This strategic milestone reflects the company’s continued growth in developing sustainable data center assets that power the next generation of technological innovation.
https://t.co/EvBpbFZtq9
NEOIX PLC, a leading developer of sustainable hyperscale #datacenters, today announced a strategic cooperation agreement with Baukontor Niederrhein GmbH, a specialised German engineering firm with a track record of 20+ successfully completed datacenters. https://t.co/EvBpbFZtq9