Of my friends who use coding agents heavily, the happiest seem to fall into two distant camps:
a) Controlled fast loops: 1-2min cycles, mostly single-threaded, still totally in control of the code, using the agent "to type faster"
b) Delegated slow loops: Nudging along in the background a couple times a day, while something else (design, writing, etc) is their primary focus; paying ~no attention to code; it's fine if agents get stuck for hours
Like many, I've been trying to make some middle ground work: trying to delegate more than the first camp, while giving the work more focus and technical oversight than the second. My role is planning, technical guidance, and code review. This leads to 10-30min cycles, which lead to parallelism (to avoid busy-waiting), which leads to context-switching and fragmentation, which leads to working memory churn and poor comprehension, which leads to situations where neither the agents nor I understand what's going on.
It sucks and I hate it. I'm moving faster, but the work is unpleasant and unrewarding. It seems awfully hard to exert *partial* technical control—much easier to exert ~full or ~none.
The ideal, maybe, is something like what @simonlast outlines in https://t.co/M6R3EF5Yy7: teams of agents making technical plans, reviewing each others' work, autonomously and adversarially testing, etc. You still get robust "technical oversight"—just by other agents. Unfortunately in my domain (mobile interface with heavy gestures and animation) that's not yet tractable, even with lots of homegrown scaffolds and probes. But things will probably look very different in a year.
I'm curious if others have found happy middle ground between these poles?
Cursor CEO Michael Truell on the future of writing code: "Our goal with Cursor is to invent a new type of programming."
"It looks like a world where you have a representation of the logic of your software that does look more like English."
"You can imagine kind of an evolution of programming language towards pseudocode. You have written down the logic of the software, and you can edit that at a high level."
"It won't be the impenetrable millions of lines of code, it'll instead be something that's much terser and easier to understand and easier to navigate."
@mntruell with @lennysan on Lenny's Podcast
Cursor/Graphite’s @TomasReimers just announced Origin
@cursor_ai’s long awaited Git competitor, scalable for agent workloads, extensible with api and mcp, and built in merge conflicts and co failure agent resolution
Increasingly, I believe companies may need to be rebuilt from the ground up, where you have a single timeline of all observability + product metrics + file changes laid out in a retrievable system, like Datadog + Posthog + Google Drive + Slack (really unified filesystem of Claude Code chats + Codex chats). This might be the new data foundation for any and all companies to maximize AI. Needs to be rebuilt because keeping track of diffs on existing system basically impossible to produce longitudinal information on decisions and rollbacks, something coding agent storage companies are actively trying to figure out, but this should extend to businesses as a whole.
Highly skeptical existing businesses will adopt this though because it means overhauling everything about their instrumentation and business data, but I think businesses built on this foundation probably can execute 100x better and faster
You should basically never use Fable for coding, but instead use it as a planner/orchestrator.
Most of today's advanced models can implement a spec perfectly, and once done you can send the work to Fable to review.
This has been my most powerful flow so far.
And yet again, back to Claude with Fable 5.
And somehow, I don't think the constant turning of tables, the mind-boggling innovation, will ever end. How lucky we are to be alive during this time.
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers.
Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?
Mira Murati says human-AI collaboration needs models that can listen while they think:
"The types of models that we work with today, they're very turn-based. You talk, they talk, then they go off and think."
"While they're thinking, it's almost like they're deaf and blind. They cannot perceive anything else about what's going on."
"By contrast, our interactions with each other are very rich. There is a lot of information in our interactions when we are silent, when we're thinking, when we're interrupting one another."
"Interaction models are able to capture all of this nuance. They're not turn-based. They're more like time-based interaction, where they're continuously taking in audio, text, video, and continuously providing output."
"This enables you to catch things like interruptions and simultaneous speech, and really create a rich, high bandwidth interaction between humans and machines."
@miramurati at Bloomberg Tech live with @emilychangtv
Unclear if a durable trend, but CEOs and CTOs are back to coding with a fury, thanks to coding agents.
I have public company CEOs sliding into my DMs (and “InMail”) telling me about falling in love with shipping software again thanks to Claude Code and Vercel.
“Dream accounts” that we always wanted to work with, where in the past the C-suite would hardly understand the infrastructure until much later in the game.
Coding agents are the ultimate PLG-fication of the enterprise. Bad, legacy software can’t hide anymore. The stack that works is self-evident to the entire organization, from intern to CEO.
You can't sell to enterprise clients well without having been an enterprise buyer yourself.
When I was building my first company, I couldn't wrap my head around why anyone would pay $100,000 for annual software contracts, let alone $1,000,000.
Then I sat on the other side of the table after one of my cos was acquired. When I was the one approving those contracts, I immediately understood why companies pay that and what exactly they need to see before they do.
Agentic coding tip
Pay down tech debt immediately
With agents, there should be no such thing as “tech debt”. An agent should simply pay down every bit of tech debt before presenting you with the “finished work”. Unlike humans, agent time is not very valuable; it can and should continue to work on something until it’s done, and shouldn’t make concessions assuming human constraints.
If an agent tells you that it’s “leaving something for later”, tell it to go finish it first before saying it’s done.
Example prompt (best if it is in your docs):
```
Do not leave any tech debt behind. If you have taken any shortcuts, go back and do them right. This is a hard acceptance criteria that must be completed.
```
For complicated agent work, it's amazing how much GPT5.5 has improved. I found 5.2 to be very far behind Opus. Now using Opus 4.7 after 5.5 feels like a big step backwards. Gotta love this level of competion! Strong comeback for OpenAI.
Everyone is obsessed with AI making a 10x engineer a 1000x engineer.
The recent reductions at CloudFlare and Click have me me realize the plot is equally about the inverse: AI amplifies the *negative* impacts of poor performers.
If a person with poor taste, who makes mediocore judgement calls, and doesn't properly build things customers love is able to produce 10x more work - does a company want that?
Hell no! Productivity isn't just about as many people as possible tokenmaxxing. AI is a double edged sword, especially when it's used to produce net new work.
If you give a bad artist a pen that can draw 100x as fast, you're going to pile up with a lot of junky artwork very quickly.
And since it happens so quickly leaders are now able to see quickly who is Picasso and who is not and adjust accordingly.
Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model."
41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below
here are the 12 rules senior engineers settled on:
1. think before coding: state assumptions, don't guess. the model can't read your mind, stop hoping it will
2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add "for future flexibility," you've added 200 lines you'll delete next quarter
3. surgical changes: touch only what you must. don't let it improve adjacent code, that's how PRs blow up
4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early
5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers
6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5
7. surface conflicts, don't average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice
8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read
9. tests verify intent, not just behavior: a test that can't fail when business logic changes is wrong. all 12 of Claude's tests can pass while the function returns a constant
10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour
11. match the codebase conventions: class components? don't fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing
12. fail loud: "completed successfully" with 14% of records silently skipped is the worst class of bug. surface uncertainty, don't hide it
what actually compounds instead of the next framework:
- the CLAUDE.md file as institutional memory across sessions
- eval-driven changes, not vibe-driven
- checkpoints over speed
- explicit conflicts over silent blending
- discipline over framework, every time
- one repo, one rules file, no exceptions
be a few rules ahead of AI twitter before this becomes mass-opinion
study this
Exactly! The winning strategy is not betting on who has the best model this month. It is building the deployment layer where intelligence actually compounds.
That means serving the best possible agent tokens on durable infrastructure: route to any model, train your own when it makes sense, and own the context, harness, environment and interfaces around the agent.
Applied Compute is building this customer-first deployment layer. We help customers build intelligent systems where the value compounds on their side.
No one's talking about how sandbox forking is going to change how multi-agent handoffs work.
Right now, when one agent hands work to another, you destroy the VM and start fresh. The context, file state, environment, etc are gone.
With forking, you can just instantly clone the running sandbox, and the next agent picks up from exactly where the last one left off.