I've been thinking for a while about the types of primitives emerging that can shift the business model of the Internet and also shape new offerings. It's distilled here as some food for thought: The New Primitives (link below). For those who take the time to read it, thanks.
just to state the obvious: think there's a collison course between those who believe research and science should be open and those who believe we are in an accelerating singularity curve.
I have many smart friends who have believed both for a while but seeing more and more their realization that these beliefs will be in conflict.
I for one believe that America and the west needs open and distributed access to research and computation and sharing of ideas at all times.
Hot take: token costs are why there will be no saas apocalypse and why good dev tools are cached intelligence for agents!
The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints!
We just tested this and the data says the opposite.
We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch.
Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success).
And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong.
In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time.
Good tools are cached intelligence for agents!
So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive.
We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast!
https://t.co/Y7q6yuxZrZ
Everyone got the economic implications of the Bitter Lesson backwards.
A little awkward for the labs: the theory they cite to explain their dominance is the same one that says their core product can't be a defensible prize.
The lesson gets quoted as if it were universal — enough compute, and general methods win, everywhere. The fine print is that this only works where "good" is cheap to verify.
Point compute at a crisp objective with a reasonable feedback loop and it eats the task, which is why math and code fell first. And "the lesson applies here" turns out to be the same as "this domain has no margin left in it" — compute isn't scarce enough to lock anyone out (every hyperscaler will have enough of it), and open source distills the frontier on a short lag regardless.
Capabilities converge, and the pressure shifts to better routing across models to minimize cost.
Winning a verifiable domain comes with the curse of being commodified on the other end of the race.
But here is where things get interesting, and why @satyanadella is playing one move ahead of many: the other half of the economy is less accommodating. Is this the right strategy, the right early-stage investment, the right hire, the campaign that actually captured a new trend — here "good" is expensive, contested, and slow to learn, so there's no cheap objective to point compute at, and the Bitter Lesson isn't binding.
Hard-to-verify domains stay valuable for an almost embarrassing reason: nobody can tell, cheaply, what counts as a good answer. Compute will hand you a thousand of them and do a sloppy job at telling you which one is the one you should ship.
Taste, judgment, curation based on experience: that's the surface where the Bitter Lesson, verification being the true bottleneck, delivers defensible businesses.
Which tells you where value will accrue. The models — mostly compute, data, and increasingly tooling and architecture — are the inputs that commoditize first, so putting your moat there is choosing, deliberately, the race with no margin.
The interesting move is one level down, in the sectors the lesson can't reach, and there the only durable asset is whatever defines "good" where good isn't easily verifiable: your own verification stack. The proprietary record of what actually counted as right — the "this was wrong, here's why" that has to come from outside the loop, or the model will Goodhart itself into confident, sycophantic nonsense.
That moat is two things, and the second is load-bearing: the ground truth your work generates, and the expertise that keeps improving it. The trace is a record — copyable, perishable, only as current as your last correction. The expert is the renewable source: the standing ability to look at the model's next thousand answers and say which one is right, in a domain where that expertise lives in nobody else's head.
Compute can't automate a signal it can't measure, and open source can't distill weights that are still inside your people. What stays scarce is the firm that can keep climbing up the intelligence value chain.
@satyanadella makes the operating version of the case below — your own hill-climbing machine, with the models learning inside it. The polite way of saying: bet where there is no cheap verifier yet (and ideally there never will be), and hire and retain the experts who are the top verifiers in their domain.
Which leads us back to the labs and that awkward spot. They can't defend the model — the theory they live by commoditizes it first, and competition is thorough that way — so they have to come for your eval and verification layer instead: a complement worth absorbing, the kind that arrives bundled into the platform as a thoughtful free feature, for your benefit, naturally.
"the model alone is no longer the product"— Greg Brockman
To succeed, the one input they have to quietly fold in is the ground truth your own operation throws off every day.
Your moat was always the private traces — your failure cases, your taste, the benchmarks only your own work produces.
Outsource your traces and you've outsourced judgment, which was, the job.
Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. https://t.co/2zX5bHdhsa
“Bill Gates was an ‘early template’ for a species of capitalist overlord that has since become excruciatingly familiar: the nerd-bully, whose oddness and rudeness are the necessary effluent to his genius.” —Ben Tarnoff https://t.co/qwLl1mWBrm
A mental model for working with coding agents is that they're blind squirrels running into a maze and bumping into walls. You must place the walls (verifiable constraints) strategically so that they end up in the general region you want them in.
Getting to the context grit!
Love this line: “They can extend their graphs, but only in the shape of what they already are. Anything that doesn't fit - and most of organizational reality doesn't - gets squeezed in awkwardly, lives outside the graph, or doesn't get captured at all.”
@luhelminger@Schwarzenegger We will have new mental gyms. Counterpoint is that those other things are still good:) … ask any tough New Englander about chopping wood, or transcendentalist about walking.
We have gyms today because, after the Industrial Revolution, people’s jobs became much less physically demanding.
We’re now on track to remove the cognitive friction associated with jobs too. Therefore a new muscle gets weakened: the brain.
Once all the research is definitive, people will realize that they need to strengthen that muscle just like any other. And we’ll all be waking up early to get some reps in at the brain gym.
New York’s Metropolitan Museum of Art said it has agreed to merge with the Neue Galerie, cosmetics billionaire Ronald Lauder’s esteemed museum for German and Austrian modern art housed in an ornate beaux-arts building across Fifth Avenue. 🔗 https://t.co/KhYcA3VWzj
Learning about this project over lunches and meetings has constantly blown my mind.
It has made me re-think my understanding of what transformers can do (the changes are so simple and elegant!), and what language models could be.
We've all been super hyped internally, now u2
HEADLESS
The reason more “dashboard” software companies will push to become headless stores (that is, become “pipes” companies) is for two reasons:
(A) one of the most durable opportunities in software is to become the singular data store for all structured or unstructured context for a certain domain or function (or heck, for the entire organization).
(B) once a function or org embraces you as the data store, you price based on compute and storage, and beautifully compound and grow as the org context inevitably explodes over time, the way other compute/storage businesses like Databricks and Datadog have durably and steadily compounded over time.
Now, of course, the challenge is that as soon as you open up to 3rd party agents, some of these agents, being venture funded companies themselves and run by ambitious entrepreneurs, will try to suck all the context out of you, reduce you to a CRUD database, and ultimately offer a version of you to their customer for free, bundled with their agents. Meanwhile, you will likely offer your own bundled free or low priced agents as alternatives to the most commonly used third party agents.
This war between headless context stores and agentic workflow startups is just starting. Will be interesting to see who can commoditize their complement first.
Anthropic acaba de lanzar el empleado más barato y eficaz del mundo.
Se llama “Claude for Small Business”.
Y esto es lo que puede hacer:
• Gestionar facturas, pagos y finanzas
• Crear campañas, diseños y contenido
• Organizar ventas y clientes automáticamente
• Leer, resumir y redactar documentos
• Gestionar emails, calendarios y archivos
• Ejecutar tareas entre múltiples apps
Todo desde Claude.
Cómo funciona:
→ Conectas las herramientas que ya usa tu empresa
→ Claude entiende el contexto de todo tu negocio
→ Ejecuta flujos de trabajo automáticamente
→ Incluye automatizaciones ya preparadas
→ Funciona con Microsoft 365, Google Workspace, Canva, DocuSign, QuickBooks y más
Anthropic no quiere que Claude sea “otro chatbot”.
Quiere convertirlo en el sistema operativo de millones de pequeñas empresas.
La idea es simple:
En vez de abrir 10 herramientas distintas, hablas con Claude y él hace el trabajo por ti.
I think there is a meta version of this idea: become a known container of secrets
> become a prophet
These are not contrary: a prophet may well construct machines that privately monetize secrets, but a prophet uses the revelation of secrets to underwrite her position
You will know that the AI labs believe in ASI when they disband their newly formed consulting (sorry “forward deployed engineering”) groups. As long as people are required to figure out how AI is useful & do organizational change & systems integration, jobs seem to be pretty safe
I don’t know how good this new 12 million context system is, or if it’s hype or whatever, but I think it definitely shows a point I’ve been making since 2023.
We really suck at everything.
- The chips are primitive
- The research and training and inference systems are primitive
- Our RL approaches are primitive
- We’ve barely started building harnesses
Everything we’re doing is massively inefficient right now.
And there are thousands of vectors for improvement.
And many of them are multiplicative.
Most people think we’re at like 88% of AI’s capabilities, and we’re pushing to hit 92% or eventually 97% or something.
Nah. This is us at .0003%
Everything we have is Punch Card AI.
And as the AI gets better it will reveal that it’s similar for our understanding of medicine, physics, chemistry, etc.
This barely even day 0. This is pre-history.