"MAI-Thinking-1: Building a Hill-Climbing Machine"
Microsoft just did something almost no frontier AI lab has done before
They shared how they engineered the data behind a frontier-scale model in unusual depth.
From data collection and eval decontamination, to data mix scaling, this paper lays out how they managed 30T pretraining tokens plus 3.55T midtraining tokens
Surprisingly, they also used no third-party distillation and no open-source training datasets
The model itself is not a jaw-dropping release, but the paper might be the best open look yet at a frontier-scale data factory and hill-climbing loop.
MAI-Thinking-1 by Microsoft looks to be approaching sonnet level model, the 109 page tech report is gold
they got 29T unique tokens without any synthetic tokens for pretraining which is exact opposite of what they were doing with phi models !!
so many counter intuitive decisions but the best part is they talk a lot about data.. this is a must must read
Super detailed tech report for MAI-Thinking-1, with a ton of info on all stages of the pipeline. I'm surprised so much of this info is released :)
Super long thread on my notes:
> buy truckloads of good books
> remove unspeakable amounts of slop from web data
> build a shedload of held-out evals
that was my work on mai-thinking-1
the model gets 97% AIME
and I can speak for hours about ISBNs
read the tech report: https://t.co/XyBJudWQE2
Qualitatively observed the same among AI researchers. The most successful are often exceptionally strong in seemingly orthogonal areas.
Stay general kids…
Build Jarvis with Claude Code, 100 lines of python and iMessage:
* Script watches Messages DB for texts from your number
* Forwards to Claude API with tools (shell, chrome, email)
* Claude executes + replies via iMessage
* Run as Launch Agent on always-on Mac