✅ System 1 instinctual output
✅ System 2 intentional reasoning
✅ Tool use, run the code and validate the results
🔜 Execute in a loop at a broader goal, and step back, watch, and steer
We’ve also enabled code execution as a tool, so the model can decide to write and execute code during its response. You can enable it in the sidebar in AI Studio!
Here’s a fun example where the model ballparks the solution with a formula, but writes some python code to arrive at the exact answer.
No one:
Claude Opus 4.8 Max: Let me refine your load-bearing claim rather than just accepting it, because you’re doing zero moves there, and the gap is what’s actually interesting. The one place I’d still push, because I think it matters: your message is wearing content-clothes, but the content isn’t actually *there*. The tell: it’s just an empty string. But the emptiness of the string IS its lack of content. Pull one, and the other goes inert. That’s the structural spine.
After 40+ forward deployed engineering (FDE) engagements, we learned the hardest part of building AI agents and tools is Context Extraction.
FDE sounds like an engineering role. It's actually 3 jobs in one:
• Consulting - where in the business to build
• Product - what to build
• Engineering - how to build
Coding is the easy part now thanks to Claude Code, @cursor_ai, and other coding agents.
The hard part is everything before the code.
Extracting context from clients who have it scattered across people and tools. And creating context when it doesn't exist at all.
Then using that context to figure out what to build, and work with AI on architecture and development plans.
FDEs turn the chaos and unknowns within every company into shipped AI applications.
That's why every major AI company is building an FDE arm. OpenAI and Anthropic recently raised $5.5B for theirs. Cursor and others have several open FDE job listings.
But their returns won't come from service revenue. They'll come from tokens and subscriptions. Service revenue doesn't matter to VCs, only tech revenue does because it's more scalable.
Here's how FDEs make coding agent companies trillion dollar companies:
Cursor and Claude Code are currently focused primarily on the professional engineer market.
But the total addressable market (TAM) for coding agents is infinite because almost every job benefits from code. It just used to be too expensive.
FDEs are the bridge from the technical market to the non-technical market, which is far larger.
Every coding agent and LLM company will eventually automate and productize their FDE teams though.
So we decided to replace ourselves before someone else does:
• Voice agents run discovery interviews to find problems, map workflows, and extract expertise to train agents on
• Cloud agents build prototypes, make demo videos, and collect feedback
• Consultant sub-agent prioritizes AI use cases by business impact vs engineering effort
The next most valuable problem for coding agent and LLM companies to solve is figuring out where to build, what to build, and how to build.
Context is the solution. So if you can figure out how to extract and create context, you can make a ton of money.
Coding agents can take it from there.
The release candidate for MCP 2026-07-28 is out. The protocol is now stateless: no handshake, no session id, any request can hit any server instance. Plus extensions as first-class (MCP Apps, Tasks), auth hardening, and a proper deprecation policy so we don't have to do this again.
https://t.co/XRLTu1BSkB
Token costs will become a dominant topic in enterprises going forward with AI. Just got out of a dinner with many Fortune 500 enterprise CIOs and this was the most heated topic.
A mix of strategies are being employed, but basically no one feels like they have the right solution. A mix of: figuring out how to prioritize workloads to different models, giving out access to better or worse agents by user type, setting different spend caps by team, having teams justify AI by their use-case, and some just having unfettered access.
Everyone is trying to figure out a semi/predictable model right now in a world where the underlying tech and cost models are constantly evolving.
Live from Code with Claude London: we're launching self-hosted sandboxes (public beta) and MCP tunnels (research preview) in Claude Managed Agents.
Run agents inside your own perimeter, with your security controls applied by default.
People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question:
How would we build software in the future if tokens don't matter?
We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference.
We run codex on every commit to review for security issues (as it's far too easy to miss).
We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues.
We have agents that can recreate complex setups, spin up ephemeral https://t.co/Q1NRXLemEy machines, log into e.g. Telegram, make a video and post before/after fix on the PR.
There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews)
We have codex running that scans comments for spam and blocks people.
We have codex instances running that verify performance benchmarks and report regressions into Discord.
We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them.
We build https://t.co/bmA1XnoB7P to split all our projects into functional units to review and find bugs and regresssions.
We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities.
All that automation allows us to run this project extremely lean.
Your ChatGPT subscription now powers an OpenClaw agent that genuinely feels magical to talk to.
Previous OpenClaw releases had OpenAI models running, but they never quite let the models reach their full potential. That changes today.
Personality is now deliberate, tool calls land exactly where they should, and your agent actually follows through on what it says it will do.
OpenClaw is now running on top of the Codex harness by default. In handing the inner loop to OpenAI's native Codex harness, we eliminated the conflicting instructions and duplicate tools that used to make the model hesitate.
What we stripped out under the hood:
- Duplicate tools (no more guessing between Codex native vs OpenClaw versions)
- Conflicting instructions (no more NO_REPLY vs message tool ambiguity)
- Leaked context (heartbeat logic only appears on actual heartbeat turns)
Less context bloat. More room for the agent to think.
And here's what we inherited for free, thanks to the Codex App Server:
- Searchable dynamic tools. Roughly 5,500 fewer upfront tokens per turn, which means faster and cheaper.
- Auto-Review mode using the built-in Codex guardian.
- OpenAI's native plugins (Calendar, Email, Drive) running in the same thread.
For you, the result is a personal agent that actually feels personal. It picks up where you left off across any channel, handles things before they hit your radar, and only breaks your flow when it has something genuinely worth showing you.
For developers, the result is stability. Because the inner loop runs on OpenAI’s native Codex harness, every upstream improvement lands in your agent automatically.
To get started, paste this in terminal:
> openclaw onboard
That is the whole setup.
As system of record incumbents shift to headless agents, they are making an implicit bet that the data layer will remain the source of value.
Startups will compete on a new set of factors, like proprietary data, owning the action layer, real-world execution, and selling to technical buyers.
The next generation of systems of record is already starting to look agentic such that they capture the context, initiate the work, and record the data exhaust.
Full piece from a16z's Seema Amble: https://t.co/8hOj26bPuf
Starting today, you can run cloud agents inside fully configured development environments.
Set them up the same way you'd set up a laptop for an engineer: cloned repos, installed dependencies, and toolchain credentials.
self-verification (Outcomes) + self-learning (Dreaming) are two of the most interesting new features we shared at Code With Claude last week.
a few notes + video links to the talks ...
Between us having built this at @tryramp with Inspect, and watching other great companies like @WorkOS, @stripe and @Shopify build this, some clear takeaways are emerging:
1. AI adoption multiplies exponentially when it’s done in public. If you work with tooling that keeps learnings private, you’re doing a disservice to your entire business. When every knowledge role is rebuilding how it works, you need everyone to contribute to the corpus of knowledge by default.
2. Bespoke tooling that’s shaped to your business is easier than ever to build. Losing time trying to shape your processes around other products isn’t a trade off you have to make anymore. Choose platforms that will let you stay flexible to build what works for you and your business.
3. Cultures of experimentation are more important than ever with AI. We are still so early. Shape your business to take big bets, and cut losses early. Whether this be for internal efforts like these, or the product you ship, now’s not the time to be risk adverse. It’s a far greater risk to think that anyone has won the game.
Software for Agents
@aaron_epstein
The next trillion users on the internet won't be people. They'll be AI agents, and they're already doing real work on top of software that was designed for humans clicking buttons.
Every major category of software needs to be rebuilt for agents as first-class citizens, and that won't come from incumbents.
Company Brain
@t_blom
Every company has critical know-how scattered across people's heads, old Slack threads, support tickets, and databases, and AI agents can't operate like that.
We think every company in the world is going to need a new primitive: a living map of how the company works that turns its own artifacts into an executable skills file for AI.