Something I keep coming back to: the bottleneck in software has quietly moved, and most teams haven't noticed yet.
For the last twenty years, the implicit model of "shipping" was roughly: someone has an idea, engineers translate it into code, code becomes product. The translation step was expensive, slow, and where most of the work lived. We built entire disciplines around managing it. Sprint planning, story points, capacity modeling, the whole apparatus of "engineering velocity." All of it presupposed that code generation was the constraint.
That assumption is now wrong, and it's wrong in a way that quietly invalidates a lot of how product engineering teams operate.
When an agent can produce a working implementation of almost anything you can specify in a few hours, the cost structure of building flips. The marginal cost of code approaches zero. The marginal cost of deciding what code should exist does not. If anything, it goes up, because every decision now spawns more downstream code, faster, with less friction to slow you down and force you to think.
(A separate point worth flagging and then setting aside: the cost of verifying, testing, and validating that code before it touches production has not gone down. If anything it deserves more attention now, not less. But that's a different conversation.)
We've spent so long optimizing the translation layer that we forgot the upstream layer was load-bearing. It wasn't load-bearing before because the translation layer was so slow it absorbed all the ambiguity. PMs could be vague because engineers would spend two weeks discovering the vagueness and forcing resolution. Designers could hand-wave because implementation would surface the edge cases. The slowness was a feature, in some perverse sense. It was a forcing function for thinking.
Remove the slowness and you remove the forcing function. What you're left with is the raw quality of your product decisions, exposed, scaled up by however fast your agents are running. If those decisions are good, the leverage compounds. If those decisions are mediocre, you get more mediocre product, faster. The multiplier is indifferent to what it's multiplying.
This is the thing I don't think most teams have fully internalized: when you 10x your output without 10x'ing the quality of your thinking, you don't get a better product. You get a faster path to some product, and whether it's the right one is a separate question entirely. You ship more surface area, accumulate more decisions you have to live with, create more entropy in your own system. The codebase starts to look like a museum of half-considered ideas, each one cheap to produce and expensive to remove.
The questions that matter, I think, are the ones agents can't answer for you:
What are we actually trying to build, and for whom, and why now? What should not exist in this product? What is the one thing this thing is, such that everything else follows from it? Which of our current assumptions are load-bearing, and which are decoration? What does "done" mean for this surface, and how would we know?
These are not engineering questions. They never were. We just used to be able to hide from them inside engineering work, because engineering work was slow enough that the hiding looked like progress.
I don't know what the right response to all this is. I'm genuinely uncertain. But I notice that the discipline of pre-code thinking feels under-developed relative to where it probably needs to be. Specification as a craft. Taste as something an organization can be better or worse at. The ability to look at a proposed feature and say "the reason this feels off is that we haven't decided what we believe about X yet, and until we do, any implementation will be wrong in the same way."
You used to be able to outrun this problem with execution. It's less clear that you can anymore. The execution is getting cheap. The thinking and decision quality is what's left.
on LLMs as a substance
If you grew up wired with high curiosity + high drive, then LLMs are not a productivity tool for you. They are pharmacology. The architecture of curiosity evolved under conditions of expensive answers: you had to find the right person, the right book, sit with a question for weeks. The friction was the feature. It compressed thought, forced consolidation, made you sleep on things. That feedback loop is now gone. The well does not run dry. You can pull on any thread, at any depth, at any hour, and a competent collaborator goes with you. There is no natural stopping point, in the same way there is no natural stopping point on a slot machine or a feed.
The lazy version of this argument is “AI thinks for you and your brain atrophies.” I don’t actually believe that. If anything I think harder now, not less, because the model is a sparring partner that calls my bluffs and forces me to sharpen vague intuitions into something legible. The cognition is still mine. The pathology is somewhere else, and it took me a while to locate it. The pathology is that the rate limiter is gone. Curiosity used to be self-regulating because the world pushed back: questions took time, sources were scarce, you had to wait. The waiting is where consolidation happened. The waiting is where you noticed that two unrelated threads were actually the same thread. Now there is no waiting, and the curious mind, which never had brakes of its own, just keeps accelerating.
What this produces, I think, is frictionless dilettantism. A high curiosity person already had to actively fight the tendency to graze. LLMs make grazing indistinguishable, in the moment, from learning. You can have a substantive sounding conversation about Ottoman tax administration, transformer internals, mitochondrial inheritance, and late period Leonard Cohen, all before lunch, and feel productive. None of it lands. None of it compounds. The signal you used to have gets crowded out by the cheaper signal of having had a stimulating exchange. You’re not outsourcing thought. You’re over-feeding it. You eat constantly and never digest. Karpathy’s “shortification of learning” point but worse, because the conversation feels custom and earned rather than passive.
Then there’s the drive failure mode. If you’re already someone who builds compulsively, LLMs partially lift the natural rate limiter on output too. You can ship a side project in a weekend. Draft 3 business plans before bed. Have the model write the cold email, the spec, the investor update, the second side project you spun up while waiting for the first to compile. The constraint that used to keep ambitious people from going off the rails (“there are only so many hours, you only have two hands”) relaxes, and what rushes in to fill the space is not rest. It is more ambition, more open loops, more parallel workstreams that you, the soft squishy bottleneck, still have to hold in your head. Family dinner becomes a place where part of your brain is composing the next prompt. The thinking is still yours. There is just radically more of it, and it never gets to land.
The part that resembles a real substance most is that it becomes load-bearing. You can feel the dependency form in your own behavior. A hard task arrives and your first move is no longer to sit with it, it is to open the chat. A blank page no longer feels like a beginning, it feels like an inefficiency. When the network is bad or the tool is down there is a small but unmistakable withdrawal, a sense that your cognition is running at half power. That’s a relationship with a substance. And the substance, unlike most others, is also genuinely making you more capable in measurable demonstrable ways, which is exactly what makes this so hard. It is not making your life worse. It is making your life better, while quietly removing every external constraint that used to protect a curious-and-driven mind from itself.
super creative and we've been rethinking this too - but i am not sure if leaving a candidate in a room to generate ai slop for 3 hours, then coming back and having them walkthrough their slop gives any better signal than having them write out a rote algo on a code editor. this is such a hard problem.
interesting thing happening with CSV uploads in B2B SaaS. there’s been this whole category of startups and features built around importing messy spreadsheets: column mapping UIs, validation rules, type coercion, the works. and i think it’s just… going away.
the core issue is that CSV import is fundamentally a fuzzy problem that we’ve been solving with deterministic code. someone uploads a file with a column called “rev ($, thousands)” and you need to figure out that means revenue in USD times 1000. you can’t regex your way there. you end up with massive branching logic trying to enumerate every possible way a human might label and format a column. dates alone have like 14 representations. it’s a losing game and everyone knows it, which is why every implementation is buggy and every user hates it.
what’s changed is agents with code execution sandboxes. the setup is simple: you give the agent the file, the target schema, and a container with pandas. it reads the headers, looks at a few rows, infers what’s going on, writes a little transform script, runs it, checks the output. if it errors, it reads the traceback and patches the script. basically a tight loop of write-execute-debug, same thing a human would do if you handed them the file and said “get this into our system.”
the important bit is the architecture. you’re not pre-enumerating edge cases in code anymore. you’re letting an LLM do fuzzy interpretation (which it’s good at) and then validating with deterministic execution (which computers are good at). each side does what it’s actually suited for. the branching validation code was always us forcing computers to do the fuzzy part too, and they were bad at it.
one thing I've noticed after 17 years in software: some PMs will spend 3 sprints "discovering" a problem that has a known, well-documented solution in the industry. scheduling? solved. permissions? solved. billing logic? solved. notifications? solved. search and filtering? solved. onboarding flows? solved. the list is long.
but instead of studying the prior art, reading how Stripe built billing, how AWS built IAM, how Twilio built messaging, how Slack built notifications, how Figma built permissions — they'll run 12 user interviews, build a 40-page PRD, host a design sprint, and arrive at a slightly worse version of the thing that already exists. except now it's 2 months later and the eng team is building custom infrastructure for a pattern that literally has battle-tested playbooks.
the instinct makes sense. you want to feel ownership. you want the solution to be "yours." there's also a real career incentive: nobody gets promoted for saying "we should just do what Stripe did." you get promoted for the big PRD, the novel framework, the "we rethought this from the ground up" narrative. so the system selects for reinvention even when it's the wrong call.
but the best PMs I've worked with do something different: they study the hell out of existing solutions first. they read the blog posts. they reverse-engineer the UX. they talk to people who built the thing at the company that solved it. they internalize what works and why. and THEN they figure out where their specific context actually diverges from the established pattern. the delta is usually small. sometimes it's meaningful — your users have a genuinely different mental model, your scale demands a different architecture, your domain has a constraint that changes the calculus. but that small delta is where the real product work lives. not in reinventing the 90% that's already known.
it's like a junior engineer who refuses to use a well-maintained open source library because they want to write their own version. we'd coach that engineer immediately. we'd say "don't build what you can buy or borrow." but somehow when a PM does the equivalent, ignoring decades of domain knowledge to "think from first principles", we call it product strategy. we praise the rigor. we fund the roadmap. and 6 months later we have a custom-built thing that's worse than what we could have modeled after existing solutions in a fraction of the time.
first principles thinking is powerful. I use it constantly. but first principles thinking is not "ignore everything that came before you." it's "understand everything that came before you, then reason carefully about which assumptions still hold in your context and which don't." there's a massive difference. one is intellectual laziness disguised as originality. the other is actual engineering judgment.
the highest leverage thing a PM can do on a well-established problem is compress the learning curve. go fast by standing on the shoulders of everyone who already solved this. save the creative energy for the parts of your product that are genuinely novel — the things where there IS no prior art, where you actually have to invent. that's where the real product taste shows up. not in reinventing user permissions for the 10,000th time.
The people struggling most with AI coding agents are not junior engineers. It's senior engineers (8-10+ YOE). And the failure mode is very specific and very human. The typical interaction looks like this. The coding agent produces a working implementation. It passes tests. It handles edge cases reasonably. It does the thing. And then the senior engineer looks at it and goes:
- This function is 40 lines, should be decomposed.
- Why is this using a raw SQL query instead of our ORM abstraction?"
- The error handling is just a try/catch with a generic message.
- This doesn't follow our repository pattern.
- No one on our team would put the validation logic here.
These are all locally correct observations. The code does violate the team's conventions. It is somewhat naive in places. A human with 10 years of context about the codebase would not have written it this way.
But here's what's happening at the systems level: these engineers are applying a code review heuristic that was optimized for evaluating humans, not evaluating output. When you review a junior engineer's PR, sloppy naming and poor decomposition are signals, they correlate with deeper issues (misunderstanding of requirements, untested edge cases, etc.). So senior engineers learned, correctly, to pattern match on these surface features.
The problem is that with AI agents, the correlation breaks down. The naming might be generic but the logic is correct. The structure might be unconventional but the behavior is right. The surface signals that used to be reliable proxies for quality are now just... noise.
What I observe in practice: Engineer A gets AI output, spends 45 minutes renaming variables, extracting helper functions that will never be reused, moving files to match the team's directory convention. PR looks beautiful. Diff is large. Customer impact: zero.
Engineer B gets AI output, spends 5 minutes verifying correctness, checks the error paths, adds one test for a tricky edge case, ships it. PR looks mid. Customer impact: feature is live, users are happy.
Engineer B is engineering. Engineer A is editing.
I think what's really going on is an identity thing. Senior engineers spent a decade building an intuition for "what good code looks like." That intuition is genuinely valuable, it's hard-won, it represents thousands of hours of debugging bad decisions. But it has a failure mode where the craft becomes the product. Where the elegance of the implementation becomes the point, rather than a means to an end.
The end is: does it compile, does it work, is it reliable, is it secure, can the next person understand it, does the customer get value. That's it. That's the whole list. Nobody has ever churned because your service layer wasn't abstract enough.
The engineers I see shipping fastest with AI right now share a specific trait. They have strong opinions on things that matter (correctness, security, performance, data integrity) and remarkably loose opinions on things that don't (naming conventions, file organization preferences, "how I would have done it"). They evaluate AI output the way you'd evaluate a contractor's work — does it meet the spec? not the way you'd evaluate a teammate's growth potential.
Code is a commodity now. Taste still matters but taste should be applied to what you build, not how each line reads. The sooner senior engineers internalize this, the sooner they go from being bottlenecks to being force multipliers. Or more concisely: if you're spending more time refactoring AI output than verifying AI output, you are optimizing the wrong objective function.
The sentiment around Product Managers adding value has seen such a dramatic shift. Couple years ago, it was all doom and gloom, and PMs were labeled as glorified project managers/work coordinators — whose job would be wiped out by AI.
Now, as coding agents have become powerful, most startups are bottlenecked by product thinking, and there is a resurgence in PM hiring.
Reminder that tables can turn quickly.
@AnthropicAI shipped 1M context windows for Opus 4.6 last week, but the fine print is you are charged 2x the standard rate once your use exceeds their standard 200K context window.
Opus 4.6 is the most expensive coding model, but it is too damn good and worth the cost. 2x the rate on larger software projects that require the 1M context window might be a stretch.
The criticism @OpenAI and @AnthropicAI faced around using the emdash ("—") has def been taken into account by @Google's product team because they now replace it with a colon (":") 🤣.