This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
Today, Ramp raised $750M at a $44B valuation.
Last time we grew this fast, we were 1/20th the size.
For 2000 years, business was built on two pillars. Today, a third: intelligence.
It’s your least governed cost. It’s also your single greatest opportunity.
I wrote this ~3 months ago, and since then,
1) Memory has been more or less fully integrated with the frontier models
2) Almost all features that made OpenClaw unique as a harness has been fully absorbed by the frontier models (e.g. schedules, loops, goals, memory, etc.)
3) New, vertical killing features and capabilities are being added every other week
--
All that being said, agentic engineering is still an incredibly high skill affair.
It is now obvious to me that there is a gulf of know-how and tacit knowledge between those that CAN remove humans-out-of-the-loop and actually produce a working product, and the rest of the world insisting that agents are still producing "slop".
Maxi/mini takes on AI:
Maxi-
1) Cybersecurity will be an enormous consumer of tokens. All public-facing code needs regular, in-depth pen testing.
2) We are barely scratching the surface in code. Enterprises will have tens of thousands of apps, dynamically generated apps, per user/per customer apps, etc.
3) Enterprise adoption is still nascent. Leading companies are spending 10x+ as much on tokens as laggard companies.
4) Video/image understanding/generation will be huge drivers of token spend- as will anything to do with operating IRL/physics.
Mini-
1) Many use cases (consumer chat, various enterprise "agent" use cases) are close to intelligence-saturated. Intelligence saturation will render many major use cases today dirt cheap in a few years. See @StatueofIBBertY's posts on this topic.
2) Enterprise spend rationalization is real and imposed spend ceilings may be durable. As costs decrease/efficiency increases token volumes will go up, but budgets may not.
3) The global compute "shortage" is driven by capacity hoarding, not inference volumes per se (yes individual companies are inference compute constrained, to be clear). "Enterprise" adoption probably won't be fast enough to keep up with capex, we need prosumer-y use cases like chat/code that are very responsive to model quality increases. If we go 8-12 months without those, funding for training cools even slightly and hyperscalers like Meta pivot to being neoclouds, we could easily find ourselves in a compute glut.
4) No one knows what % of frontier model API spending is from cleverly hidden distillation attacks (or partly motivated thereby). Many, many deep-pocketed actors are motivated to do this at scale.
In general, I don't know if there's ever been a cycle where there are so many considerations on both sides that have order-of-magnitude-sized impacts on how supply/demand work out for tokens, infra, etc.
What a time to be alive.
- image or video editing? write scripts
- finances, tax work, etc? put in PDFs, write scripts, output HTML
- medical advice? put in PDFs + data, output HTML
- filling out paperwork? write scripts
- creating a report? write HTML
- making plans? write HTML
Can’t even give us a jaw dropping playoff run
Can’t give us a bend knee moment
Can’t beat doubles
Can’t beat single coverage
This is the worst B2B MVP ever
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.