🧵 1/2
Just launched: The first open-source reasoning model that fully thinks in-character 🧠
The entire reasoning process (not just responses) embodies the persona you provide through system prompts. This makes interactions more realistic and human-like. Uncensored and unbound.
Anthropic just released Claude Fable 5, basically a public version of Mythos. Benchmarks look great, beats Mythos Preview on cybersecurity too. But those are just benchmarks. Offensive capabilities should definitely be lobotomised or heavily safeguarded, and if it’s the latter, good luck keeping @elder_plinius out of it.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
The longer I watch WWDC, the fewer use cases are left for ChatGPT and Gemini for most non-power users. All of this seamlessly and natively baked into the OS itself: Siri covering basic search, analysis and automation needs, and new image editing and generation features replacing what people normally turn to Nano Banana or ChatGPT Images for. The big unanswered questions though: how frequently will they update the models, and what exactly will the rate limits look like?
Google AI x Siri and Apple Intelligence integration confirmed across Apple’s OSes. Just one wish: ship the features at launch this time, not “coming later”.
Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. https://t.co/2zX5bHdhsa
Today, we’re launching Reve 2.0, the best 4K image model in the world.
We invented a new way to generate and edit any image using precise layouts. For the first time, it’s possible to create images you can touch.
Super excited to introduce Gemma 4 12B! 💎
- Multimodal: audio, image, video, and text input
- Novel architecture: we removed the multimodal encoders for a unified, streamlined arch
- New MacOS desktop app powered by LiteRT
- MTP support
Excited to see what you build with it!
Microsoft just unveiled an AI agent platform with two concept devices, a wearable badge and a desk companion. Both do things that could just be a phone app. The Humane AI Pin died for this. The Rabbit R1 died for this. One day we’ll learn.
What changes when agents become both a new unit of programming and an emerging new unit of human-to-machine interaction? The mission of Project Solara, a new software platform coupled with tailored hardware solutions, is to pioneer agent-first experiences that are shaped around you: your agents, your tasks, your environment, under your control. #MSBuild
MiniMax just dropped M3 and the benchmarks look strong, but their M2.7 scored ZERO on DeepSWE. That’s the one result I actually want to see before getting too excited about this release.
Latest DeepSWE results are out, which tend to closely reflect real-world SWE performance. Claude Opus 4.8 disappoints on both performance and cost. GPT-5.5 xHigh holds first place, coming in 12% better and half the price of Opus 4.8, which is comparable to GPT-5.4 xHigh in performance but costs three times as much. Anthropic has some catching up to do.
Can’t figure out how people get Claude Opus 4.8 to fail the carwash question. Tried it multiple times and even without max reasoning effort, it handles the trick just fine. 🤷🏻♂️
Then how would the model know the purpose of going there?
Maybe the person asking has a meeting at the carwash, or is picking up a friend at the carwash, or works at the carwash and is heading to their shift, or needs to speak with the manager about something unrelated. Without the explicit goal of “I want to wash my car”, the scenario completely changes. Claude is just refusing to assume an unstated intent.