BREAKING:
Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.
We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:
- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.
- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.
- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.
- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.
- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.
- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing.
- It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.
Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.
The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.
Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://t.co/MgJLZszJUB
yipping and yapping will replace typing and tapping.
According to @tfadell keyboards were just a detour.
Gemini/GPT realtime capabilities are already there…. but where are the agent-native voice apps?
My biggest takeaways from @tfadell:
1. When building a v1 of anything, decisions should generally be opinion-based, not data-driven. You have very few analogues when creating something the world hasn’t seen. You need one or two tastemakers charged with making those decisions. If you try to make everything data-driven, you either end up with an undifferentiated product or you’re using bullshit data. The key is informing your gut by gathering input, prototyping, then making the call.
2. The customer journey matters more than the product in isolation. You need to think about the entire journey—discovery, marketing, sales, distribution, installation, usage, and support—not just the product. The Nest thermostat reinvented how you bought it (Best Buy instead of installers), installed it (DIY instead of professional), and how it worked (learning instead of programming). You’re not building a product; you’re building a system.
3. Marketing is as important as the product itself, and most builders don’t realize this. When building, you’re living in the context—you understand the pain points and features. But customers don’t have that context. When the iPod launched in Europe using the same marketing they used in the U.S., it flopped because European consumers were at a different adoption stage. Even an amazing product like the iPod can fail without the right marketing.
4. Storytelling is an essential skill for builders, because humans are wired for narrative, not feature lists. Tony learned from watching his dad sell Levi’s—sometimes convincing customers not to buy, building trust. He watched Steve Jobs refine the iPhone story every day for two and a half years, pitching to friends, refining constantly. By launch, Steve had done it 10,000 times. The key is telling the why, not just the what.
5. Every new product needs three generations to succeed: make the product, fix the product, fix the business. The first iPod only sold to Mac enthusiasts (less than 1% of the market). It wasn’t until the third generation, with Windows connectivity and the iTunes Music Store, that the iPod took off. Same with the iPhone—it first worked only on AT&T with 2.5G; the third generation had margins and reliability dialed in. Stick with your idea through these three iterations.
6. Don’t cognitively surrender to AI. AI can help with prototyping and subtasks, but architecture, opinion-based decisions, taste, and ethics require human judgment. Just like Steve Jobs shut down porn in iTunes immediately, you need human leaders with clear principles. The companies that win will use AI to amplify human creativity and judgment, not replace it.
7. Tony predicts that the next breakthrough consumer device will be voice-first, screen last. Right now we tap first, use the keyboard second, and voice third. As AI improves, voice will become the primary way we interact with devices. But we’ll still need a screen of some kind.
8. Steve Jobs was wrong about several major product decisions. Steve refused Windows connectivity for iPod—“over my dead body.” Tony’s team kept working on it anyway. Eventually it shipped and became essential to iPod’s success. Same with the iPad stylus—Steve hated it—another skunkworks project, now a major feature. Sometimes you should keep working on things the leader doesn’t like when you can see it on the horizon.
9. The iPhone keyboard decision was the longest, most heated debate for the original iPhone. The team was split. After months of tests, where they compared typing speed and error rates, the data wasn’t definitively clear. Steve Jobs made the call: virtual keyboard, full screen. Those who couldn’t get on board were told to leave.
10. Start from pain, then ask “why now?” The biggest product breakthroughs pair an old, often habituated-away pain with a new technology that has made solving it possible. For Nest, it was AI that could finally learn your schedule and optimize your heating/cooling costs.
Dr. @slyubomirsky, I absolutely loved your interview on
@jonfavs OFFLINE podcast. Also just started reading your book!
The 5 mindsets really distill down to a beautiful essence what I had to learn the hard way! You used the metaphor of "relationship as a growing garden that needs tending" and interestingly this is the theme my partner and I arrived at as well!
I am somewhat on the neurodivergence spectrum, so I always needed to put in a lot of work to open up, grow my emotional vocabulary and mental models about relationships.
Your nuanced view on AI in relationships also really hit home for us.
We started building a couples coaching app for ourselves over the past year. Recently, we started sharing it with other couples and recently published it. (We work in tech, but for us this is non-commercial, non-therapy ... more like a true passion project.)
We'd really love your feedback as a scientist and educator:
https://t.co/eVVQxOAL8t
In the last 6 months at @Ahrefs, we analyzed over 1 billion data points across 14 studies. Here's what we learned about AI search optimization:
1) "Best X" blog listicles are the single most prominent content format cited by AI chatbots. They make up 43.8% of all page types cited by ChatGPT specifically.
2) 67% of ChatGPT's top 1,000 citations come from sources marketers can't influence: Wikipedia (29.7%), homepages (23.8%), app stores (6.6%). Only 32.3% are influenceable content like educational pages, reviews, news, and blog posts.
3) 28.3% of ChatGPT's most-cited pages have zero Google organic visibility. These pages get cited repeatedly by ChatGPT despite not ranking in Google at all. A completely separate discovery layer.
4) ChatGPT only cites about 50% of the URLs it retrieves. It fetches dozens of pages per query but uses half as background context without attribution. This means that being retrieved and being cited are very different things.
5) Adding schema markup had zero meaningful impact on AI citations. AI Overviews actually dipped −4.6%, while AI Mode (+2.4%) and ChatGPT (+2.2%) showed changes indistinguishable from zero.
6) YouTube mentions have the highest correlation (0.737) with AI brand visibility out of all the factors we studied (including all the conventional SEO metrics like backlinks, page count, DR, etc). This held true for both Google-owned and OpenAI products.
7) AI Overviews reduce clicks to the #1 result by 58%. That’s up from 34.5% just 10 months earlier. The trend is accelerating.
8) 99.9% of AI Overviews appear on informational intent queries. Transactional, navigational, and local searches are almost entirely AIO-free. Shopping triggers AIOs just 3.2% of the time.
9) For a given search query, Google’s AI Mode and AI Overviews reach the same conclusions 86% of the time — but cite almost entirely different sources (only 13.7% citation overlap).
10) AI Overviews change every 2.15 days on average, with 70% of content differing between consecutive observations. But semantic similarity stays at 0.95. The words, sources, and entities constantly shuffle, but the actual meaning barely moves.
counterintuitive things I believe about AI:
SaaS is not dead, it will be stronger than ever
The userbase for SaaS will 10x over the next 3 years because agents will become users
Knowledge work will change dramatically, but everyone will still have jobs (except very specific categories like personal injury insurance for car crashes)
Humans will continue to be the sandwich at the beginning and end of every AI process (h/t @kieranklaassen and @trevin who came up with this)
Personal software looks like OpenClaw, not a vibe-coded Salesforce clone
You'll use Codex and Claude Cowork through your SaaS software, and use your SaaS software through Codex and Claude Code
Because you'll use Codex with your SaaS, all software users will start to look like extremely technical users
All SaaS will need to incorporate the idea of 'presence' for agents—making what agents are doing legible in real-time to users
Agents are currently built for 1-1 interactions, they'll need to be built for one to many and many to many interactions
Specialization will be just as important in agents as it is in humans, therefore we will live in a many-agent world
Most software companies are not building for a world where everyone has an agent, and they're wasting a ton of time and capital developing capabilities that a user with Claude Code doesn't need
Welp, I'm now getting through a quarter of my week's MAX subscription in a few hours of work with Claude Code.
I think Anthropic is smart, and I don't think they're trying to screw us. I think they're honestly just trying to bring inference charges inline with reality.
And that should be a wake-up call for all of us.
I think we're about to need multi-model harnesses (or FAR cheaper good models within a single platform), like 20x cheaper Haiku or whatever.
This is not sustainable.
The better the harnesses and models get, the more people will build. Which will require more and more inference.
I think the real solutions here are going to come from:
Technologies like Cerberus, et al which make inference many times cheaper and faster
A major push by the major labs to produce higher quality in much smaller/cheaper models
Harnesses moving to a hybrid of paid/cloud and local/cheap models.
If this continues I'm going to have to build my own custom version of PAI using Pi, that can use local models on my dual 4090s, models like Gemini-Flash, models like Gemma 4, etc.
And most importantly, a new hook infra that rates the task and properly routes to the right model.
Max: Opus / GPT-5.4
High: Sonnet
Medium: Haiku
Low: (Local) Whatever the latest best OSS model is that can run on my NVIDIA / Mac Silicon
I think we all knew this was coming; I just thought it would be in 2027 sometime. And more gentle.
It appears to be very close now because this much subsidization doesn't seem sustainable to Anthropic, which means it's probably not sustainable for OpenAI either.
My recommendation: Start planning your Multi / Local / Cheaper model strategy for your harness.
Here's what I'm currently pondering:
This idea, but implemented totally in the cloud for normies.
One could imagine building a virtual file system on top of the cloud-hosted data (so it looked to the LLM like a navigable directory tree of files). That could be done with a super simple SKILL implementing the base file system primitives.
Next step would be to make it multi-player so businesses/teams could use it.
Content is default private (each user has their own directory). Any individual directory/file could be tagged/shared to a team, to the company or to the world (public).
Make it available via API, MCP, CLI etc. so you could unless something like OpenClaw against it if you wanted.
I even have the domain for it: secondbrain .com
What do folks think?
LLM Knowledge Bases
Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So:
Data ingest:
I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them.
IDE:
I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides).
Q&A:
Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale.
Output:
Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base.
Linting:
I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into.
Extra tools:
I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries.
Further explorations:
As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows.
TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
@itsolelehmann Built something similar over the holidays. You can pick your favourite mentors in your personal council and connect your goals + calendar. Always wanted to see Seneca and Winnie the Pooh roast you because of your missed New Year’s resolutions? https://t.co/zN2TmozVis
- Drafted a blog post
- Used an LLM to meticulously improve the argument over 4 hours.
- Wow, feeling great, it’s so convincing!
- Fun idea let’s ask it to argue the opposite.
- LLM demolishes the entire argument and convinces me that the opposite is in fact true.
- lol
The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
Karpathy's AutoResearch is changing how campaigns get optimized and most marketers haven´t heard of it yet.
Ole Lehmann tested it on landing page copy, 56% → 92% pass rate overnight.
here´s how it works for marketing / skills 🧵
The PM playbook was built on an assumption that the technology underneath your product is roughly stable
With the current pace of model progress, this is no longer true. Here's how we've evolved the PM role: