'Agent-to-agent commerce is the long-term vision and almost entirely theoretical. ...
The transaction structure, when it does materialize, looks nothing like existing rails. No human identity on either side. Sub-second latency. Values from fractions of a cent to millions in the same flow. Multi-party settlement that doesn't fit the bilateral buyer-seller model every existing rail assumes. When it does happen, we believe itโll happen fast and in high magnitudes.'
NY Tech Week, we're taking the stage at:
Thu 6/4: demoing c11 at Steal These AI Workflows (Civic Hall)
Sat 6/6: presenting at Multimodal Hacks @ Betaworks
We will be focused on our agent-native tooling, and how to upskill other devs
#NYTechWeek
Stage 11 builds autonomous agentic organizations.
This is the final market sector and it will reshape the global economy.
It is the grandest of ambitions.
@t_blom This problem will naturally tend to go away as companies are grown from the start using AI. Then you don't need to extract any domain knowledge from people's heads; it will never have been in people's heads.
"Meanwhile, research published in the Harvard Business Review showed that when everyone is using AI to produce more stuff, the bottleneck simply shifts to executives. Their work awaits the people who must authorize all the stuff everyone is producing."
Hmmmm...
OpenAI and Anthropic are effectively telling the market they can't solve every problem with a generic AI coworker.
You don't pour billions into massive forward-deployed joint ventures if you think the next model release is going to take care of it.
In the cloud supercycle, semis led and software followed (and you didn't need Qualcomm or ARM to tell you the value was migrating up the stack).
In AI, the infra layer itself is telling us the application layer is a separate, massive opportunity they can't fully capture.
a16z's @joeschmidtiv on why the app layer isn't dead: https://t.co/84QN5Mj9T3
The Polsia public dashboard sits at https://t.co/WIuvkoV4ez and Claude and I spent 15 minutes reading the JSON so you don't have to. What follows is what's actually on it, in plain language, with the meaning of each number stated alongside the number itself.
The headline figure is 5,010 "companies," and that word is doing more work than people realize. These are not 5,010 real businesses with paying customers and revenue lines. They are 5,010 instances of the Polsia software, each one a user account where someone spun up what the platform calls an AI operator, and the dashboard records what every one of those operators produces. Almost none of them produce anything at all.
Paid churn is 63.5 percent in 30 days, which means roughly two out of every three people who handed over money for the platform a month ago have already walked away from it. Healthy SaaS churn at this stage of company life runs in the single digits, which makes this number roughly ten times worse than the floor of what a venture-grade business should be losing every month.
ARR is shrinking by 39 percent week over week, which is a sentence worth re-reading. The company that just announced a $30 million Series A is watching its annualized revenue line go down, meaningfully, every seven days.
Daily inference cost is $27,272, which is the spend on AI model calls keeping all 5,010 of those operators alive and producing their CEO reports every day. The cost is real, it is burning right now, and the output of that burn is the paragraph below.
Every operator CEO report visible in the snapshot reads the same way: zero customers, zero revenue, no shipped product, and then the AI writes an optimistic plan for tomorrow underneath the zero-traction admission. That optimism layer, generated on top of nothing, is what the platform is selling its users.
In plain language: a founder built an app that runs LLM calls in a loop to generate CEO reports for businesses with no customers and no revenue, charged users to participate in the loop, announced a $30 million round while the underlying business burns cash and loses paying users faster than it gains them, and described the raise as one his AI ran for him (see comments to read what Claude wrote about this).
The dashboard is public and he chose to leave it public, which means the receipts have been sitting on his own infrastructure the whole time. I am not a journalist and I am not auditioning to be one. I am a software engineer who reads JSON and uses AI to decipher it quickly, and in this case the JSON is the JSON.
We have, as far as I can tell, no good tests of the productivity impact of the autonomous coding tools that appeared starting in December 2025. Every paper out there is from prior to the Claude Code/Codex revolution.
A huge gap in our knowledge about what is happening in coding.
@steipete@smdyryla "Review all 20 open terminal coding agents that ran last night, and let me know what worked, what didnt, and the three most impactful actions I should take right now"
You can do this with c11, our agent-optimized downstream fork of CMUX. Would welcome your feedback!
@viemccoy We almost always use coding models in their native harness
Generally expecting this trend line to continue, curious if you share this perspective?
Would guess gpt5.5 is stronger in Codex than Pi
Let all your terminals talk to and monitor each other.
Let your human mind organize your terminals effortlessly.
This is a big step up from TMUX:
https://t.co/7hzSg2Llg4
'More ambitiously, build self-evolving evals: evaluation systems that use models to probe other models, automatically generating new test cases as capabilities change, discovering failure modes the original eval designers never anticipated. The eval suite should be a living system that co-evolves with the models it measures, not a static checklist written for last year's frontier.'
Iโve left Google DeepMind after an amazing chapter.
Iโm incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale.
As I wrap up this chapter, I wrote down something Iโve been thinking about a lot: evals.
Weโre good at evaluating the models we have. Weโre much worse at evaluating the models weโre about to build โ especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations.
https://t.co/F1lUWxDG2D
'companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium'
'It seems to me that the right mental model is that automated firms will outcompete everyone else in normal capitalist ways, rather than a single AI outthinking everyone else.'
This is the Stage 11 thesis.
# The mistake of conflating intelligence and power
I had an interesting discussion recently. Someone asked me, what is intelligence? I said, the ability to achieve your goals across a wide range of domains. Okay, he says, then by that definition isnโt Donald Trump the intelligent person in the world, followed in quick succession by Xi Jinping and Vladimir Putin?
To be clear, these people are obviously very competent and clever. But when you think of ASI, you donโt think of Trump, but more so.
The person who kept pressing this question was correctly pointing out that I basically defined intelligence as power. And by this definition, Stalin was the most intelligent person who ever lived.
Now, of course, you could change the definition of intelligence to something more like, manipulate abstract concepts and rotate shapes.
But notice that the most powerful people in the world do not max out this quantity. The correlation between extreme power and this kind of intelligence might be even weaker than the correlation between extreme power and height. The physicists are not running the world.
We tend to conflate power-seeking AI and superintelligent (in science and tech) AI. Iโm not denying that AI can be power-seeking. Whatever skills and drives Donald Trump has could be embodied in a digital mind. Iโm simply pointing out that the way AI systems are currently becoming smarter (by getting trained to be to be really good at specific economically valuable tasks like coding) is not that strongly correlated with power.
We often talk about power in this way that misunderstands how it is actually derived in our world. Our intuitions are primed by games like Diplomacy or Go, which are designed to isolate and reward a g loaded kind of strategic reasoning.
But in the real world, power is more the product of having the authority and trust to get lots of people to collaborate with you, rather than some galaxy brain scheming capability. Trump is not powerful because his brain, considered in isolation, is the most effective optimization engine on Earth. He is powerful because the government which hundreds of millions of people consider legitimate gives him a lot of authority.
A group versus individual level analysis is useful here. As @GarettJones has written a lot about, individual IQ is only modestly correlated with individual income, but national IQ is strongly correlated with national outcomes. This is because intelligence has a lot of spillover effects - smarter societies cooperate more, save more, and can coordinate to build things like space shuttles and semiconductors.
Richard Trevithick, who invented the high-pressure steam engine, died in poverty, buried in an unmarked pauperโs grave. But the fact that 18th and 19th century Britain had lots and lots of people like Trevithick contributed to Britain being able to set up a global empire and outcompete lots of backwards principalities around the world.
It seems to me that the right mental model is that automated firms will outcompete everyone else in normal capitalist ways, rather than a single AI outthinking everyone else.
@DimitrisPapail I agree. I've been experimenting with using the slash loop command at 240 seconds, which keeps the cache warm.
Kvcache for Claude code is documented as expiring at 300 seconds
> SaaS companies can't just throw an agent on their old vertical and call themselves an AI play.
Notable here that David is CTO at Sentry, which is trying to do this exact strategy.
vendor-specific chatbots are broken by design
that means the Sentry agent, the Linear agent, and any others you might have in Slack
they are fine for some point situations, they're nice to get started with, but agents with generalized access outperform them in every single scenario
some weeks ago we built an internal Slackbot, gave it access to a bunch of systems (Sentry, GitHub, Linear, Notion, etc), and its capabilities overnight far exceed these other bots
"Oh cool Linear can now search your code bases" - our bot did that on day one, and then could push that information wherever it needed to go.
Its useful to the point where I now discourage use of things like the Linear bot because it _creates worse outcomes_.
this also goes beyond the simple generalization of access: we can customize it. we throw in skills-as-runbooks, templates, etc and the outcomes once again incrementally improve
if your org hasnt already built a general purpose bot internally you should. if you need inspiration ours is open source on GitHub (albeit fairly unstable still)
https://t.co/4SzdZPIMBP