In 4 years Mythos dangers will be remembered the same way GPT-2 dangers are looked at now.
People will train Mythos level models in 1 hour on their laptops.
AIs now rank papers by potential impact @KurateOrg. Looked up our latest ICML-accepted paper into Kurate → #8/161 this week, #41/2,300 all-time 😄 Pretty fun
https://t.co/AK37SNXzv1
Assuming models have complete advantage over human mathematicians in all areas of math (IMO not guaranteed in 15 years, but likely enough), this seems to be a question about the shape of society. Arguably the answer is not at all special to math--presumably the answer is basically the same for all knowledge work (and maybe all professions?).
I think in this world the most likely background situation is that humans are not really useful for proving theorems, but nonetheless lots of basic and understandable questions remain open, including many that are open today (since I think these have basically unbounded difficulty). And lots of other questions of basic interest have been resolved. So human activities might include (1) trying to understand solutions, (2) trying to understand progress and obstructions to resolving open questions, (3) (non-rigorously) understanding mathematical phenomena.
In all these activities, the purpose of the human is to serve as a locus of understanding. There's an obvious question as to why we would pay someone to do this. One plausible answer is that maybe we want to avoid complete disempowerment--at a minimum we might want people to understand what they can about what the AIs are doing--which requires development of human capital.
I was talking about this during today’s @pintofscienceCH event. We as researches need to swallow the bitter pill: we are the equivalent of hunters chasing the best theories but the scientific „agricultural” revolution with theories growing in silicon is right behind the corner.
I have been using OpenAI models for mathematical research ever since my first successes with GPT-4. I saw this coming, and seeing the pace accelerate makes me very excited. I think 2026 marks the beginning of the real singularity in mathematics. We have passed the event horizon, but we have not noticed it yet. Brace yourselves: it will only accelerate from here.
Stanisław Lem has written about this transition in Scientists’ role in his Summa Technolgiae 60 years ago. It feels surreal to see this sci-fi vision playing in front of our eyes.
Another inspiring work from Bartosz!
I vividly remember the talk he delivered 15 years ago in my highschool, introducing us to this beautiful part of science where mathematics overlaps with computing with concepts like Conway’s „Game of Life” and Langton’s Ant :)
I am happy to share that I have finally finished the big project of properly formalizing all the claims in Andrzej Odrzywołek’s paper on the EML(x, y) = exp(y) - log(y) function in Lean 4.
The project took me about two weeks of work, and I think it was a very refreshing experience. I will describe here, in an informal way, what I actually did, while deferring the technical details to the GitHub repo, which contains everything and is fully reproducible.
1. I decided that the work arXiv:2603.21852 should finally get a full Lean 4 formalization. This is an ideal task, since the scope and breadth of the work depend entirely on foundations laid out in Mathlib.
2. My plan was to use this project as a test of agentic engineering and design. I put a lot of effort into designing an intricate system based on Claude, Mathematica, Aristotle, and GPT Pro: Claude for orchestration, Mathematica for specific identity chasing, Aristotle for formalizing the many parts of the paper — including very crucial negative feedback — and finally GPT Pro as a critical feedback model that re-steered the Claude orchestrator whenever it got stuck. Finally, Codex was used to informalize some of the Lean statements.
3. I did the work in several batches. My supervision was based on insights and on gaining a deeper understanding of how the combinators work on specific domains.
4. The hard aspect of this work was that we wanted to have a full domain definition. This turned out to be impossible at some isolated points.
5. In hindsight, I should say that I honestly learned the hard-to-write details of the EML theorem. The many identities between elementary functions gave extra depth to some of the choices. The Lean code feels light and structured.
6. In this project, I felt more like a "mathematical engineer" than a typical "tinkering mathematician". This is a very different feeling, but it is a cool type of job. If you orchestrate AI properly, you can get a lot of satisfaction from such work. If you know how to tinker with Lean and mathematics, it becomes much more than mere vibe-coding.
7. IMHO, future work in mathematics will rely on models doing a lot of the work, with humans helping to verify it. This is an emerging new type of activity: deeply mathematical, but with a lot in common with proper engineering.
8. I am still super curious about the result itself. I was very happy to see the structural design with the combinators, which gave me the impression of good taste and structural thinking on the part of the models. It was not merely a dull formalization run.
9. Looking forward to more projects like this in the future. You can be creative in such ventures in entirely new ways. It is not subpar compared to proper mathematical tinkering. It is different, and it is fun.
10. This project also shows how important it is to know all the top-tier AI tools on the market. Switching between models and using them against each other turns out to be very productive.
Links in the comments. Feel free to interact. Maybe there are other formalizations of this project, or similar scaffolds? Curious what you think!
I think it's incredible that, while obviously not as good as modern LLMs, this LLM is able to do in-context learning and write basic Python code. Really highlights the intelligence of LLMs.
Our new preprint is out!
We introduce Process Reward Agents (PRA) - a new framework in which the reasoning capabilities of a frozen reasoning model are decoupled from the Reward Agent, steering the reasoning process at test time.
Preprint: https://t.co/CbjPDEi65I
Page: https://t.co/cxdb7bN3T7
Code: https://t.co/qnSxn3BUYd
Big thanks to a stellar team of co-authors @de_Jiung@TomaszSternal@KStyppa@thoefler! @ETH_en
1/
AI is increasingly changing how we do mathematics.
Erdős Problem #650, open for over 60 years, was solved a few weeks ago through a collaboration between human mathematicians, an informal reasoning model (GPT 5.4 Pro @OpenAI) and a formal one (Aristotle @HarmonicMath). 🧵
@kingofknowwhere Love this take! Matches my experience - never stops surprising me when people with 5 ML papers at ICML/NeurIPS can’t explain what precision and recall are.
@KShevchenkoReal@KShevchenkoReal, can you please quote where you took the 90% estimate from? I thought this particular route through the Polish-Belarusian border is closer to 3% of the total freight movement.