Humans + LLMs has been standard at the top level of forecasting for a while, virtually everyone uses them. Trusting them beyond search and simulations was another story. With @_Mantic_AI, I've experienced a step change that might be as big as the move from manual Google to LLMs
Monitoring the Iran war on Polymarket and Kalshi has been a big upgrade, but it still feels like drinking the future through a narrow straw.
We flew top forecaster @DrTournesol to London to ask Mantic any question he wanted about Iran.
This exercise is a peak into future, not just of the war but of forecasting itself.
Yann was one of the top ranked forecasters and question-askers on Metaculus last year. The best forecasters love collaborating with each other, but Yann has never had such a responsive forecasting partner as Mantic. This is like a "Claude Code moment" for understanding the future.
Here are Yann's 28 key questions, most of which aren't on the prediction markets (link in reply).
We're trialling a new kind of forecasting tournament. The challenge: submit forecasting questions that trigger divergent predictions from the top AI forecasting systems.
There's a $25k prize pool for the question writers, allocated by how much disagreement you can elicit.
Motivation:
- AI forecasters are becoming competitive with human pros.
- Many questions are "solved", e.g. if I ask "Will a nuclear bomb go off in Europe this month?" all the models know it's <1%.
- Still, other questions are intractable, because of aleatoric uncertainty. "What will be NVIDIA stock price in 1 year?" Again, the models will agree (this time by being very uncertain), and there's not much to learn.
- If you can make the AIs disagree, you've found something interesting: a place where the AIs have divergent models of how the world works or differences in what information sources they're relying on.
- Identifying these wedge questions will help the field develop AI forecasters that can tackle genuinely challenging problems. This is exactly what we'll need them for, as we navigate the uncertain world ahead.
Please apply! Link in reply.
We're launching a new kind of forecasting tournament at @_Mantic_AI. There's $25k in prizes for writing questions, see post below to read more and apply.
We're undergoing a two-sided "prediction revolution" :
(1) The rise of prediction markets (Polymarket, Kalshi)
(2) AI's getting much better at predicting world events (Mantic)
Iran is the first major geopolitical crisis where we can benefit from both. Gabriel offers a peak into the new paradigm.
Models that are great at calibrated predictions will be transformative for decision making. Excited about Mantic's work and proud they're using Tinker. Their new blog post digs into their methodology and findings.
I always dreamed of AGI as a wise advisor for humanity. Although LLMs are great for coding & knowledge work, I wouldn’t trust them to give me advice on my career, business strategy, or policy preferences. How can we build AI systems optimized for wisdom?
At Mantic we believe the unlock is prediction: predicting world events as accurately as possible, and hill-climbing this single metric.
Today we share some recent progress on the Thinking Machines website, having found Tinker a great platform for our RL experiments.
TL;DR: We RL-tune gpt-oss-120b to become a better forecaster than any other model. Having good scaffolding is a prerequisite. A fun result: our tuned model + Grok are decorrelated from the other best models, and so are the most indispensable when picking a team.
We've been using RL to train LLMs for superforecasting.
Our new blog post with @thinkymachines discusses recent progress.
We're now in uncharted territory. I'm excited to see how good we can get by pushing this further!
🧵
HUMANS OF MANTIC
Hours after we launched our website, before we’d posted it anywhere, I saw a job application from a Oxford economics PhD student from Brazil:
“I’ve never been this excited about a startup. I want to help build it.”
His background was not typical for an AI startup. But he looked impressive. He’d got a distinction from Yale then spent 3 years as an economist at Goldman Sachs. In his PhD research, he was using LLM forecasters to identify exogenous shocks to fiscal policy.
We invited him to lunch with the team. He seemed smart. Ben messaged me: “we should try to get Gabriel to come in for September”.
In his first couple of days, Gabriel was reading the code. I wasn’t seeing much output. I asked Ben, what is he doing? Ben told me to wait.
Then...Gabriel emerged with an understanding of our prediction engine that was like he’d worked here for months. He started finding weaknesses and generating good ideas.
Throughout September, Gabriel was running experiments to test his fixes, and the guy did not miss. +3 points on this eval, +3 points on that eval.
To boot, he’s an lovely person. Gabriel grew up in Rio. He speaks about his childhood friends and Brazilian culture (the beach, the food) with joy in his eyes. It must have been a big culture shock turning up to New Haven as a freshman.
From the beaches of Rio to Camden's hottest AI startup, @gabrielpfritsch started in a permanent role today, as Member of Technical Staff.
📈Trends in AI performance in the Metaculus Cup, a large-scale forecasting tournament.
The top-5 AI frontier makes linear progress vs the community prediction (CP). The CP is a wisdom of the crowds aggregate. Only a small handful of elite forecasters, from 500+ entrants, beat the CP each tournament.
Extrapolating the AI trend line predicts CP-level performance in October 2027.
A new trend started last Summer. Mantic progresses at a similar speed, but at a much higher level.
The last tournament has just resolved, and Mantic beat the community, the first time ever for an AI.
For our launch party, we made fortune cookies with Mantic's predictions for what the world would look like on 1st Jan 2026.
How well did we do? The predictions were from Aug 14th.
1. Nvidia market cap on 1st Jan
🔮 $4.52 trillion (median, with a mode of 4.75)
➡️ $4.53 trillion 😎 😎 😎
2. Trump Nobel Peace Prize
🔮 94% he doesn’t win
➡️ Didn’t win
3. Jair Bolsonaro imprisoned
🔮 40%. Modal date if it happens: Oct 17th.
➡️ Imprisoned in November.
4. China launches Taiwan invasion
🔮 98% no
➡️ No invasion
5. A Chinese model top of the LMArena leaderboard
🔮 16%
➡️ No, Gemini 3 is top.
6. Jerome Powell as Fed Chair
🔮 85% still going
➡️ Still going!
7. US cuts the scheduled 50% tariffs on India
🔮 69%. Mantic read it as a negotiating tactic.
➡️ No ❌ Still there!
8. Xi Jinping out
🔮 95% still in power
➡️ Still going
9. Bank of England base rate
🔮 43% chance of 4% rate. 50% chance it’s lower, 7% chance higher.
➡️ 3.75% rate
Overall these look pretty good to me. Perhaps 2026 will be the year of superhuman forecasting accuracy... 📈
I gave a talk about AI for forecasting at the Society for Technological Advancement (@sotalikesfuture) in London.
The short talk covers:
- What we're doing at @_Mantic_AI, including the example of possible 🇺🇸 strikes on Venezuela that we automatically spotted in late Sept.
- Benchmarking! 📊 I'm worried the forecasting benchmarks are getting saturated.
- The idea that good foresight = forecasting accuracy + prescience.
It was fun meeting everyone -- there's lots of excitement in the space! 🚀
SoTA's first Frontiers Night is complete!
A series of brilliant talks, demos & case studies with a lively panel comprising Toby Shevlane (@tshevl), Michael Story (@MWStory), Ben Warner, and Tom Oliver on Forecasting the Future.
We discussed the art & science of prediction across AI forecasting, simulating human behaviour, and leveraging crowdsourced intelligence to make better-informed decisions.
Thank you to Faculty for hosting the Society for Technological Advancement (@sotalikesfuture)!
Look out for our next Frontiers Night on Self-Driving Labs in the new year and send suggestions for future themes & demonstrators.
@_Mantic_AI, @swift_centre, @faculty_ai, Electric Twin
I had a fun conversation with superforecaster @rdeneufville on his podcast!
We discuss:
🤖 How does AI compare to human superforecasters?
🧱 Is there a data wall?
⏱️ Why to expect fast progress on sub 1-year prediction horizons
🔎 The importance of asking the right question
Link in reply, and please enjoy this short clip, with background image courtesy of Gemini (reality: in a phone booth with smaller arms)
The Market Pulse competition is for finance predictions: company earnings, treasury yields etc.
We competed in Q3 and finished:
- 19th / 122 entrants
- Highest ever AI
- 2.3x the score of the next best AI
🎯 Our best prediction was Nvidia forward guidance on margins (bullish).