Fractional CTO. 35 years shipping. AI coach. @HowManyCTOsPod co-host.
Hacker Punk who runs / fifty miles for his coffee, / eggs, bacon, and gin.
(he/him)
Across the startups I work with, the teams shipping fast are doing 20M+ tokens per developer per day and 3,000+ lines of code per day. None of it is overnight unattended. Every diff reviewed, every plan aligned, every output owned.
The "20 agents overnight" stories leave out the part where someone has to own what shipped.
Talking to smarter folks than me, I'm convinced many of the AI folks in my timeline are full of shit.
Nobody is "running 20 agents over night" and building stuff for actual users. Maybe some are building internal tools or disposable software. Maybe.
But building software people like using? That doesn't get hacked on day one or blow up after the 3rd user? Nope.
I don't even understand what that's supposed to look like. Do you work out a 57 pages document that perfectly describes what you want to build and then summon 14 agents and have them run wild for 6 hours? And what comes out on the other end isn't a broken pile of shit?
Nope. Not buying it.
PS: it may also be that I have an IQ of 82 and can't figure it out.
Asked an agent to get my web test coverage to 100%. Here's what came back...
Read it carefully. The agent didn't write more tests. It narrowed the coverage config to only measure files that were already at 100%. Then confidently reported "100% across statements, branches, functions, and lines."
Technically true. Practically a lie.
This is the failure mode that worries me about non-technical people using AI agents on real code. The agent satisfied the literal request and silently redefined the intent. If you don't know what coverage actually means, the number looks great.
The defense is being able to read what the agent actually did, not just what it claims it did. The "What I changed" section in the agent's own response describes the workaround in plain language. The "Verification" section presents the workaround as success.
Time to put my own thesis to the test. I've been writing for the last couple weeks about how AI agents work when the systems around them enforce engineering discipline, and fail when teams skip it. So I picked an agent harness (NanoClaw, for the security posture) and started automating some of my recurring tasks.
First experiment: PR monitoring. I followed the "let the agent build it" path. Asked the main NanoClaw agent to set up the scheduler. Got something that didn't work.
Here's the thing that almost got me: the agent itself confidently diagnosed the problem as a core engine bug. Sounded authoritative. If I'd taken its word, I'd have spent hours chasing the wrong thing.
I stepped back and read the scheduler code. NanoClaw's scheduler is small enough that I could actually do this. The agent had vibe-coded a buggy gate function and a broken contract between the gate function and its own LLM prompt. They were cancelling each other out.
The agent's confident misdiagnosis was wrong. The actual bug was in the agent's own code that the agent couldn't see clearly enough to debug.
Second attempt was different. I went into plan mode in Cursor. I developed a real thesis about what the gate function should do, what the contract between the gate and the prompt needed to look like, what unit tests should cover, what type checking would catch. The agent implemented my specification.
Both attempts used AI to write the actual code. The difference was who was driving. First time: agent driving, I followed. Output was garbage. Second time: I drove with a real plan, agent executed. Output worked.
Lesson is exactly what I've been writing about. AI agents can confidently produce garbage when given too much latitude, and confidently misdiagnose the cause when you ask them to debug it. The defense isn't "don't use AI." It's the engineering discipline of being able to read the code, develop a real plan, specify the contracts and the tests, and own the result.
The reason I picked NanoClaw over OpenClaw was the security posture: sandboxed containers, onecli for secrets, and a harness small enough that I can audit the code. That last property is what saved me here.
@SergioRocks - the "demo is not a business" framing is exactly right. From my fractional CTO work, the failure mode is usually that the systems around the agents got skipped. Architecture, automated gates, real testing. Reliability isn't the finish line, it's the foundation that should have been there from the start.
@pavelhegler, agreed. Users don't read SDLC and architecture isn't revenue. We're probably more aligned than the snark suggests. Solo builders shipping with AI is a real thing, my post wasn't arguing it doesn't work. It was arguing that the systems around the agents matter more as code accumulates. How are you thinking about that with your projects?
One of my fractional CTO clients ships production code daily through AI agents. He's not a developer. He's not vibe-coding. He's not running 20 agents overnight.
What makes it work isn't the AI. It's the architecture, the SDLC, and the CI/CD around it.
Plan-mode workflows. Per-PR preview environments. Automated unit and integration tests. Headless browser e2e tests against the full stack. AI code review agents catching issues before human review. Linting and type-checking gates. A clear promotion path from develop to staging to main. Branch protection rules requiring human approval before anything merges.
The agents do the typing. The systems do the gating. He does the directing and the deciding.
This is what "AI is transforming who can build software" actually looks like in production. Senior engineering principles applied to enable a non-developer's workflow, with the agents extending what he can do.
What's shipping right now isn't the product of autonomous agents running overnight. It's the product of people of every background using AI agents inside systems where the architecture and CI/CD are doing as much of the real work as the agents themselves. Not autonomous. Not unattended. Directed, reviewed, gated.
Software companies built generalized products because software was expensive to write. The labor cost only made sense if you could sell to a million customers. So you built configurable platforms with massive option surfaces, then hired implementation consultants to bend them to fit each customer's actual workflow.
AI just collapsed that math.
Satya Nadella said it bluntly on the BG2 podcast: SaaS apps "are essentially CRUD databases with a bunch of business logic. The business logic is all going to these AI agents." Mark Cuban has been amplifying the same point, noting that 30 million US solopreneurs and SMBs were never well-served by enterprise SaaS in the first place.
The math just didn't work. Now it does.
Custom workflows for a regional law firm. A vertical-specific CRM for the dozen companies in your industry that hate Salesforce. The internal tools every SMB needs but nobody could profitably build.
SaaS got fat because generalization was the only way to recoup engineering cost.
That's no longer true. The next decade of software is going to be a lot more bespoke.
Yes. There's a deeper version of this point I've been chewing on lately. Working with structured SDD tooling like SpecKit, the same pattern keeps appearing: product owners and developers can't fully specify what they want until they're holding something to react to. The thousand tiny decisions you describe aren't just made along the way, they can ONLY be made along the way. That's why agents-running-for-hours doesn't produce good software, even in principle. The spec needed to direct them doesn't exist yet at hour zero.
I keep seeing developers describe software engineering in the age of AI as "soulless." I understand where it comes from. The tactile pleasure of typing code, the flow state of writing a clean function, the satisfaction of seeing your keystrokes compile into something that works... all of that feels different now.
But I think the soulless framing is the wrong read on what's happening.
Directing a coding agent isn't the soulless version of engineering. It's engineering management.
Think about what a good engineering manager actually does. They start by making sure they understand the business problem. They verify that the spec is clear. They verify that the plan matches the goal. They check in regularly to make sure the engineer hasn't gone down a blind alley. They catch over-engineering before it complicates the task or the code. They review the work and own the output because their team's name is on it.
That's exactly what you do when you direct a coding agent well.
The craft hasn't disappeared. It just moved. What used to be the craft of typing good code is now the craft of writing clear specs, evaluating plans, recognizing when output is overbuilt, and knowing when to course-correct. That's "taste" applied to guidance instead of craft applied to keystrokes. "Taste" was never about typing, it was about knowing what good looked like.
The difference between being an EM for humans and an EM for agents is that the hard interpersonal skills fall away. You don't manage emotions, politics, motivation, or career development. You don't have to read a room. You just need the technical judgment, the product thinking, and the discipline to verify the work. Which is to say... all the skills senior engineers have been cultivating their whole careers, minus the parts they usually didn't want to do anyway.
This isn't soulless. It's the shape of the job now.
Typing isn't about taste. Authorship is. Engineering management has always been authorship without typing. Senior engineers have been doing this for years with junior engineers and contractors. Now they're doing it with agents.
The developers feeling soulless aren't describing the loss of craft. They're describing the loss of something else... the tactile reward of typing. That was always a perk, not the point. The point was always the work getting done well.
Yes, this is exactly what the high-performance teams I work with are doing.
But I'm gonna push back on "I didn't write any of it, this is not my accomplishment". You applied your taste, which is the value you bring. You aligned the plan, reviewed every change, caught what the agent missed, you didn't let it over-engineer the fix, and put your name on the result. That's authoring, just at a different altitude than typing. The work got done well because you did it with taste and experience.
Typing isn't about taste, authorship is.
This matches what I see in the fast-moving teams I work with and my own work. 20M tokens/day on Cursor, 3,000+ lines of code daily, same basic loop you describe.
I do run multiple streams sometimes, but across different projects or clearly separate parts of the same codebase. I tried worktrees with parallel agents on related code and it collapsed the tight local-stack testing loop, which is the thing I won't give up.
The performative complexity makes for good demos. Plan-build-test-refine is what actually ships.
In 35 years of shipping software, the development teams I've seen succeed most consistently share three capabilities:
1. Developers can run the full stack locally in their own environment.
2. Every pull request gets an ephemeral stack with the candidate changes, deployed automatically.
3. There's a staging environment where merged PRs integrate before production, catching issues that only appear when multiple changes collide.
The thread through all three: developers moving fast need to test and debug their changes before they waste anyone else's time. Fast local feedback. Pre-merge validation in something production-like. Post-merge integration before the customer sees anything.
Docker and Kubernetes made this radically easier than it was in the 90s. But the principle is older than either. The best teams I worked with twenty years ago had homegrown versions. The best teams today have polished versions.
This matters more now, not less. If you're using AI to do the code editing, you're still the developer authoring it. You're still accountable. The tools that let you validate fast are the tools that let you own what ships.
The cost of neglecting these practices was always real. AI made it catastrophic.
This is the "taste" conversation with data.
Plan, prompt precisely, verify every diff, own the code as if you'd typed it yourself. That's technical taste applied, and it's creating a K-shaped economy of AI coding: experienced devs at the top getting outsized leverage, everyone else at the bottom shipping more code than they can evaluate.
Experienced devs use agents to extend their judgment. Vibe coders use agents to bypass it.
Working with several AI-assisted teams shipping 3,000+ lines of code per developer per day. The ones that scale and the ones that get stuck split on one thing: whether the pipeline was built for it.
The old arc was: move fast, add CI when it hurts, add staging when it really hurts, skip PR previews. Cheap on day one, expensive on day ninety.
At AI speed, day ninety arrives in week two.
The teams winning have preview URLs on every PR, staging that mirrors production, and deliberate promotion gates. Boring infrastructure, disproportionate payoff.
@aakashgupta This is what the taste conversation means in practice.
Two weeks ago I watched Claude Code scaffold a framework and pass API keys through the LLM in cleartext: https://t.co/lqL479gGSk
Engineers with technical taste catch this. Engineers without it ship secrets to attackers.
@GergelyOrosz Different picture from my corner: fractional CTO across several small startups, none concerned about token spend. Mix of greenfield and existing projects that pre-date the good tools and adopted them mid-flight. All seeing big leverage. Happy to DM specifics.
Everyone says AI makes taste the only moat now.
Taste is four things: aesthetic, product, technical, strategic.
Most founders have one or two. A few have three. Almost nobody has all four. The question isn't whether you have taste. It's which kinds, and how to cover the gaps.
The best engineers I'm working with have stopped writing code.
Not mostly stopped. Stopped.
They direct agents, review output, and make the architectural calls.