Claude fable 5 is operating in a completely different dimension 🤯
Compared it with Claude Opus 4.8, Gemini 3.1 Pro, and GPT 5.5, the visual gap is insane.
From 3D fluid dynamics to volumetric animation and complex physics, Fable 5 crushes the game. It’s not even close.
With Fable 5, I've personally moved on to responsibilities or "loops".
I no longer tell Claude to investigate a particular crash report. It runs in a loop, watching every crash report that comes in. Its job is no longer to help me fix a crash, it's to keep our apps from crashing.
Today, we're introducing Claude Fable 5 and Mythos 5, two configurations of our next major language model.
I'd normally highlight the numbers: It's SOTA on nearly all benchmarks. I want to talk about something else, because with Fable 5 out in the world, I think a third era quietly started today.
I lead Claude Code & Cowork on the desktop, so I think a lot about how people use AI to get work done. I believe we're about to see a major shift, moving from giving AI tasks to giving it responsibilities.
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀
We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME!
Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE.
Read the full technical deep dive:https://t.co/MX0kjHKdKi
Want to experience the future of real-time AI?
👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk
⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT)
💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now.
⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience.
🤝 Enterprise & Large-Scale Needs: [email protected]
More of the iOS app loop, now inside Codex.
The Build iOS Apps plugin lets Codex view and test your iOS app in the in-app browser, open SwiftUI previews, and hot reload edits without leaving Codex.
Google releases Gemma 4 QAT. ✨
You can now run Gemma 4 at 3x less memory with near original performance.
Quantization-Aware Training (QAT) makes it possible to run Gemma 4 26B-A4B on 16GB RAM.
GGUFs: https://t.co/wQgEocxUId
QAT Guide: https://t.co/Nsm1yeGEHx
Today we published a technical blog post about Ideogram 4.0 — our goal is to enable more innovation and creativity.
It's a 9.3B Diffusion Transformer trained from scratch, paired with a frozen 8B VLM as text encoder. The nf4 checkpoint runs on a 24GB consumer GPU.
Thread 🧵
Gemma 4 dropped a 12B.
I put it on RTX 5090 against its 31B sibling.
when you cut a model from 31B to 12B, what do you actually lose?
~ reasoning barely moves
GSM8K (math) 97.5 > 96.4 (−1.1)
ARC-C (sci reasoning) 97.6 > 94.0 (−3.6)
~ knowledge falls off a cliff
MMLU (world knowledge) 87.8 > 78.9 (−8.9)
HellaSwag (commonsense) 92.0 > 81.6 (−10.4)
~~~
parameters store facts, not thinking. the 19B you delete is mostly where the model kept its trivia and world-priors, cut it and recall collapses, while the reasoning machinery stays nearly whole.
a 12B reasons almost like its big brother. It just knows less.
122 tok/s vs 53 (2.3x faster generation), ~10GB instead of ~24, meaning that you get 20GB+ free on a 32GB card for long context or a second model.
so it depends of your workload:
reasoning / math / agentic loops = the 12B is nearly free
broad-knowledge Q&A with no retrieval = that's the one job worth paying for the 31B.
We are excited to join Nvidia's Nemotron Coalition of leading AI labs working together to advance open frontier foundation models.
To celebrate we have partnered with @nvidia and @nebiustf to provide 2 free weeks of the new Nemotron 3 Ultra model on the Nous Portal!
Introducing Ideogram 4.0: the best open image model in the world.
Think it. Make it. Own it.
Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇