Median Claude Code session: 1 tool call.
90th percentile: 106.
99th percentile: 481.
Max: 2,206.
Agentic coding has a power-law shape.
Most work is small.
The interesting frontier is long-running, auditable loops.
I analyzed 245,306 Claude Code tool calls.
The biggest surprise: Bash was 47.2% of all tool calls.
Claude Code is not primarily an editor.
It is a shell operator with an LLM loop.
If your CLI is messy, your AI coding workflow is messy.
As a solo dev who just pumped billions of tokens through Claude Code in the past month, I have to say the Codex desktop app experience is just vastly better now. But, since you asked...
Here are a few things that would be massive for people running at this intensity:
1. Stronger project-level persistent memory that travels across worktrees and long-running sessions. Right now a lot of "why we chose X not Y" or architectural decisions still have to be re-seeded when agents jump between contexts.
2. A proper parallel agent command center / observability view / dashboard / kanban. When I have 4-8 agents grinding across worktrees in the background, better live status, easy steering/pausing of specific ones, and per-agent cost visibility would be huge.
3. First-class support for reusable "agent teams" or heavily customized skills that are project-aware and persist without constant re-explaining (especially now that the skills system is getting more attention and is shareable).
The foundation is already so good that these would be real force multipliers. Happy to go deeper on any of them.
Thanks for shipping the actual desktop agent product and for asking.
As someone who just pumped billions of tokens through Claude Code as a solo dev in the past month, I have to say:
The Codex macOS app experience is just vastly better now.
The combination of true parallel agents running across isolated worktrees, ⌘⌘ Appshots that instantly feed full window + text context without copy/paste/describing, native computer use that sees/clicks/types on my actual desktop (and keeps going even when the machine is locked, controllable from my phone), plus Goal mode that just stays on mission for hours or days...
Magical.
Massive kudos to @embirico, @thsottiaux, @ajambrosino, @Dimillian and the whole @OpenAIDevs crew for shipping the actual product, not just another chat interface. Massive for people like me who live in this stuff.
If you're a serious solo builder, get the app. You'll feel the difference immediately.
My daughter graduated from the University of Arizona last Friday. I was in the stadium for Eric Schmidt's commencement speech.
I'll let others opine about the content of the speech, although I thought it was hopeful and accurate and contained great advice.
What I'll comment on was what I heard from the people around me while Schmidt spoke:
"Fuck AI. No machine is taking my job."
"Billionaires suck."
"This dude is tone-deaf."
"Why did the University even allow this guy to speak?"
From my daughter via SMS during the talk: "this is so uncomfortable"
If a thoughtful, accurate, hopeful speech from one of the most accomplished technologists in 40 years lands in an arena and the room mutters "fuck this guy", the AI industry has a problem its product roadmap cannot solve.
The cultural war over this technology is already underway, and we the builders are losing it.
Schmidt was the most prepared messenger imaginable, and the room still treated him like an intruder. An enemy.
We have a lot of work to do.
https://t.co/Ey6mwzawTb
The new agent platforms are good smart workers.
They are not operating models.
We just posted the first piece in the SynapseDx Boring AI series explaining exactly why that distinction matters, and why most companies are still getting it wrong.
https://t.co/4Z17s8J2Un
The dangerous Claude Code failure mode is not incompetence.
It is plausible completion.
The model does 80%, narrates the last 20%, and stops.
Fixing that is not a prompt trick.
It is a verification workflow.
@tobi This is really great. It's like Experiment Driven Development (EDD). Using it now on a feature I've wanted on our site for a while and will report back.
@GaryMarcus If you ask your AI twice and get two different answers, you're using a toy. You can't run businesses on a toy.
Seems to me this is exactly what's required when asking an inherently fuzzy non-deterministic machine (LLM) to conduct deterministic tasks.
Enterprises have had the same problem for decades:
their data, apps, rules, approvals, exceptions, and evidence do not live in one place. They don't talk to each other.
ERP didn’t fix it.
RPA didn’t fix it.
Dashboards didn’t fix it.
Copilots or agents won’t fix it alone.
AI makes it solvable only when paired with governance.
The category is governed integration.