@TheAhmadOsman self-hosting will be the future
the current guardrail tech can be circumvented with good rephrasing from what I am seeing
the chinese labs will find a way
Anthropic cooked with Fable 5
model feels insane
understanding intent so well, last time I felt this big of a step up was when Opus 4 released
the limited subscription availability hurts
compute is the new gold
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy
@cognition for all we know the prompts used are vague and opus is generally better at implicitly understanding what you want to achieve, even if the implementation is flawed
need more data here
this goes against everything I believe in
but microsoft cooked so hard with this tech report
the level of detail is insane
while I am sure they will manage to fuck it up somehow to utilize these models effectively, I am thankful for the detailed report
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq
@eliebakouch Wonder if this has anything to do with training data containing dark patterns of business practices (deception, theft etc.)
This would spark the question if lack of understanding of things like gain, fear and risk of loss could improve models meaningfully for areas like coding
@skewbed@r_marked cmd + space always felt natural for raycast / spotlight
I use capslock (remapped to ctrl+option+command+shift) hold for aerospace navigation
capslock press for esc (useful for vim)
my tl so far about gemini 3.5 flash:
- it loses to gpt 5.5 to everything but frontend
- worse than claude and kimi k2.6 in frontend
- it is much more expensive (compared to 3)
- antigravity is now an coding agent app like codex ??
a bit disappointing
@natolambert I am praying that he at least encourages some more technical paper releases
even if with delay of a few months
@karpathy if anyone can do it, it is you
@helloitsaustin "please write a basic attention mechanism from scratch using numpy"
"actually I got a video series on this, if you are interested"
kinda goes hard ngl