MAI-Thinking-1 is out! Built from the ground up without any third-party distillation - it’s been a fun climb 🧗
Check out our 100+ page tech report: https://t.co/dEYfUUPKHj
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq
Beautiful tech report, perhaps the best western model report I’ve ever read.
Lots of great insights: no synthetic data in midtrain, teacher models are RLed directly on top of midtrain, and adaptive clip higher.
But still seems like they didn’t fully nail true on-policy as they admit their RL stage is unstable, leading to a hacky self-distillation stage (imo)
I am incredibly grateful for all the support and guidance from my supervisor @MattNiessner and my examiner @LourdesAgapito. Thank you for everything! 🎉🥂
I've officially defended my PhD! 🎓
Never thought that researching 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐨𝐧 𝟑𝐃 𝐑𝐞𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧𝐬 could be as challenging as dealing with the German bureaucracy afterwards!
Congrats to @Normanisation for his successful PhD defense 🥳🎓
Norman's thesis about 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐨𝐧 𝟑𝐃 𝐑𝐞𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧𝐬 makes important contributions to the 3D vision community. For instance, DiffRF, a generative approach directly operating in 3D space, was among the first diffusion techniques for neural radiance fields.
This led to many follow up works in this area and sparked interest across the computer vision community, establishing generative approaches as a corner stone in the 3D domain.
Also after his PhD, Norman continues to work on the forefront in computer vision, such as his contributions to MapAnything, a universal feedforward approach for 3D reconstruction.
Check out Norman's amazing work: https://t.co/FBjSKJ4lja
Congratulations Dr. Mueller - super proud!
MAI-Image-2 debuts at #5 in the Image Arena!
Highlights:
- #5 in Text-to-Image overall
- #5 for 3D Imaging & Modeling, Cartoon, Anime & Fantasy, Photorealistic & Cinematic Imagery, Art and Portraits
- #6 for Product, Branding & Commercial Design
Congrats to the @MicrosoftAI team on this milestone!
I am working on building teams for the likes of @NandoDF and @asadovsky for @MicrosoftAI! We are #HiringNow in Europe and US 🧵👇 - follow me for more updates on our Superintelligence Lab - Roles below 🔔
I’d like to hire strong data engineers to join our Microsoft Super Intelligence (MSI) team.
I am interested in people who are good at processing PDFs and other documents at billion scale, and people good at parsing the web at trillion scale.
If you dream of processing all of human knowledge to advance science and engineering, this is for you.
Also looking for strong evaluation and post-training engineers.
Be part of our first launches this year 🚀
We have all the resources in the world to support you, working in startup mode, while powering a large organisation with billions of users.
Hiring in London, Zurich, New York, Boston, Toronto, Seattle and SF.
Please send your CV to [email protected]
Video gen models excel at replacing humans in front of the camera. But what if they could augment & enhance our performance instead? 🧵 (1/6)
🎬EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers
https://t.co/ngtvKIXAz2
Introducing ShapeR, a method for robust conditional 3D shape generation from casually captured sequences.
ShapeR leverages a rectified flow transformer conditioned on per-object multimodal data to turn casual image sequences into full metric scene reconstructions.
Project Page: https://t.co/ffH2zVKd8c
Paper: https://t.co/82yWRrXnA1
Links to code and huggingface below ⬇️
Meta just released the MapAnything benchmark on Hugging Face
Universal 3D reconstruction evaluation across multi-view stereo, depth & camera pose tasks. Benchmark feed-forward models on diverse real-world scenes with standardized metrics.
Nikhil has been cooking up an even stronger MapAnything, with noticeably better performance than VGGT and others, while supporting camera information input and more! 🎉
Check out the checkpoints!
Was gonna wait to announce this 😅
🚨 but yeah new checkpoints just dropped!!
Pull latest code and update hf cache - it's a direct slot in & just a better MapA 😉
HF demo also updated: https://t.co/lz8fLgVQpn
Stay tuned for more details & 3DV camera ready ⏳
Interested in 3D Interactive Segmentation? 🚀
Don't miss Andrea's talk on Easy3D today at 1 PM (Kalākaua Ballroom)! The code was just released:
🔗: https://t.co/kPHA7PbEgH
See you later at the #iccv25 Oral Session 6B (Kalākaua Ballroom) at 1PM and poster 356 from 2:30PM!
We will present our paper “Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation” with @Normanisation
Project + Code: https://t.co/LiQD8uj8Tj
Check out or workshop on Generate Scene Completion at ICCV'25.
We have an incredible speaker lineup and most certainly the coolest website (credits to @ethanjohnweber and @cursor_ai).
📅Mon, Oct 20 (morning session)
🌐https://t.co/IBZceahOsr
📢 SceneComp @ ICCV 2025 🏝️
🌎 Generative Scene Completion for Immersive Worlds
🛠️ Reconstruct what you know AND 🪄 Generate what you don’t!
🙌 Meet our speakers
@angelaqdai, @holynski_, @jampani_varun, @ZGojcic@taiyasaki, Peter Kontschieder
https://t.co/LvONYIK3dz
#ICCV2025
Excited to share our Swiss Army Knife for Feed-forward Geometric Modeling: MapAnything is fast, accurate, robust, and highly versatile!
Try it yourself: https://t.co/Khfv936IEw
Learn more: https://t.co/DiK7srhkPs
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀
One universal model enables SoTA for:
🔥 Mono Depth Estimation
🔥 Multi-View SfM
🔥 Multi-View Stereo
🔥 Depth Completion
🔥 Registration
… and many more possibilities! – plus everything is metric 🎯
We release code for data processing, training, benchmarking & ablations – everything Apache 2.0!
Details & Links 👇