@frankxu2004 Insanely fun indeed! Couldn’t have asked for a better person to battle coding RL unknowns with. Learned a ton from you throughout the climb!
I feel incredibly honored to have contributed to this work alongside the most talented and hardworking team I’ve ever worked with. @MicrosoftAI
Building and climbing an LLM from scratch was full of unknowns, but there are also many magical moments when things finally worked.
Excited to share what we learned and give back to the community! https://t.co/WgVuPmEBAT
Excited to share as many details on what we @MicrosoftAI have been working on. Building a LLM from scratch is an awesome journey with pain and suffering battling unknowns but also many cool moments to see it (somehow) works out every stage! https://t.co/WTRRRRwUGu
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq
Is your code retriever silently breaking functions with line-based chunking?
We developed cAST, our new #EMNLP2025 work, for AST-based code chunking that preserves syntactic structure and semantic boundaries.
Paper: https://t.co/lZLTiLBXfc
Code: https://t.co/mOC9P5miq5
> helped millions get into ai before it became mainstream
> built coursera and changed online education forever
> founded deeplearning. ai to keep teaching the world
> leads major ai projects while staying humble and calm
> no scandals no noise just steady work
> still codes and still experiments
> builds small tools and projects purely for curiosity
> teaches only when he has real value to add
> doesn’t chase hype or predictions
> lives quietly learning and building at his own rhythm
has andrew ng quietly figured out life better than everyone else?
Passionate about frontier AI models, classical symbolic reasoning, and safe/secure software? Consider applying for this position on AI-aided code analysis in my team at @GoogleDeepmind: https://t.co/Z3qQMUqKex. The job is London-based, and the application deadline is July 14.
Revoking visas to Chinese PhD students is economically shortsighted and inhumane.
Most Chinese PhD students stay in the U.S. after graduation (first image, stats from 2022). They're staying and building technology in the U.S., not taking it to China.
Immigrant students create startup companies that employ Americans (second image, stats from 2018). I couldn't find stats for Chinese students specifically but anecdotally they are a significant force powering GenAI and LLM companies. These are the next $1B+ companies that are going to hire thousands of people even as other jobs are automated.
There are no real "secrets" in academic GenAI. Everything is open-source. What is the security risk? It's a much higher risk to give DeepSeek and others an advantage in hiring the most talented young researchers in these areas by preventing them from coming to the U.S.
Revoking visas for these students destroys American innovation for no benefit.
It's also unfair and inhumane. In my experience, Chinese PhD students are hardworking young researchers interested in their fields of study and being part of the cutting edge of academic and industrial innovation. They aren't geopolitical pawns. Stop implementing policies that treat them that way.
I don’t think the next big leap in software productivity comes from “vibe coding.” It comes from removing the grind in real codebases.@AugmentCode’s new Remote Agent tackles flaky tests, stale docs & tedious refactors—with up to 10 autonomous agents. Learn more: https://t.co/ESPfzLymO1
🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
🧵We just released the #1 open-source agent on the SWE-bench Verified leaderboard by assembling the best of Claude Sonnet 3.7 and O1.
Open-source repo here: https://t.co/gcKeWuogTQ
Here's how we achieved 65.4% success rate on the hardest coding benchmark in the industry: 🧠👇
For formal methods folks looking for a new position: @VeridiseInc is hiring a formal methods researcher to work on verification/analysis tools targeting zero-knowledge applications. More details are here: https://t.co/0gumtDInK9
But security isn’t an afterthought—it’s the foundation:
🔒 Proof of possession (AI only sees files you grant)
🔒 Zero cross-contamination (MTLS + service tokens)
🔒 Audited access (engineers need dual approval)
You can learn more in our blog post here: https://t.co/cRwwa92qw9
🚨 My favorite Augment Code workflow:
Check out a PR branch → ask our AI Chat to review the changed code.
It’s like having a hyper-aware teammate who:
✅ Spots edge cases I’d miss
✅ Cross-references code not even in the PR
✅ Explains unfamiliar logic in plain English
How does it adapt to that exact code version so fast?
Behind the scenes:
🔧 Context-aware indexers
🔧 Cryptographic file fingerprints (SHA256)
🔧 Secure retrieval systems
What amazes me? Engineering that’s rock-solid and invisible—securing your code without slowing you down.