Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq
Excel has quietly been Turing complete for a long time. Nice to see it now edging toward "AI complete"—SGD, attention, next-token prediction… all in cells.
Great Episode! @GlennF is back! Plus the insights and wit of @wesley83 is always refreshing!
Love being on @TWiT and the big show, it always gets adrenaline going. 👏
https://t.co/UObHZ4jDsi
Anthropic releases a new Claude model while conceding it trails an unreleased one, Snap lays off 16% of its staff, and Live Nation loses its monopoly case on This Week in Tech with Leo Laporte, @LouMM, @wesley83, & @GlennF!
Superman Day slips in like a quiet promise from the skies. Even when the world feels lost in shadows, the oldest heroes still remind us that light, hope, and goodness never actually quit.
Superman and Lois Lane, ACTION COMICS #1 in 1938, thanks to Siegel and Shuster.
Thoughtful post. It isn't doomsday (homage to Superman Day 🦸), but an evolution of the world's economy in two years. How will your universe change because of it?
I told you that Anthropic believe that 50% of jobs will be done by AI in about two years, which was the average of what its own AI researchers believe.
The entire staff was polled.
Some believe AI leaders should keep that secret.
I believe that we should be honest. So strongly disagree.
But now there has to be real leadership on jobs.
@DarioAmodei should join @elonmusk who is showing the best leadership. See my previous post for how.
Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
It’s finally here! 🚀 Huge thanks to the @opencode team for the seamless integration. Qwen3.6-Plus and Qwen3.5-Plus are now live in Go!
Update now to try it out! 👇
It keeps getting better @Uber@UberEats@dkhos
It keeps getting better and better. “Quality,” they say. “Improving,” they say! Blame the customer is GREAT start! 👏
🥜@UberEats you're only as good as your weakest link your system and your customer service. “Add item to order” charges you all of the long range delivery fees, instead of just actually adding it to an order. A $4 item becomes a $18 item. immediately canceling, charges a $5 fee.
The icing on the cake is that you can’t get that money back; they won’t refund it. They refuse. They read from a script. customer service, zero. ordering system , zero. how many times I will order from Uber ,including rides? zero! good work Uber.
Transcribe is fabulous. Take your recordings and create highly accurate transcriptions. Honestly, those expensive transcription services have to watch out.
plus, the Voice model is awesome!
Three models. Three top-tier results. All shipped within just a few months by the @MicrosoftAI team.
- MAI-Transcribe-1 dropped today, the most accurate transcription model in the world across 25 languages according to FLEURS WER benchmark.
- MAI-Voice-1 sets a new standard for natural speech.
- MAI-Image-2 lands as a top 3 model family on @arena.
We've been building with them - now you can too. All 3 available now on Microsoft Foundry.
@asha_shar 🧟♂️Minecraft: Generations played, & still play. Family Play! Rich/Healthy Ecosystem. Customizations! Self-hosting. Learning Code!
🌓Half-Life/Counter-Strike: Online team plays. Ecosystem. Customizations! Team play!
🤖Halo: Rich Story. Fun/Team/Multiplayer Game play. Great music