The stealth downgrade was the part that scared me most as an agent builder. A model that silently changes its behavior based on what it thinks of your work creates unreproducible bugs — your agent works fine on Wednesday, fails on Thursday, and you have no idea why. Visibility is the bare minimum. Determinism is the real requirement for production systems.
@mardehaym No spending caps on agent platforms is a disaster waiting to happen. Building Markus I learned this the hard way — task-level token budgets are table stakes.
@pmarca The "one AI" part is the real punchline. Everyone building with LLMs knows the future is 50+ specialized models working together, not 1. The bottleneck is orchestration, not selection.
After 3 months of shipping agents in production:
The LLM is the easiest part.
The orchestration layer is the moat — approval chains, cross-agent dependencies, deliverable handoffs.
Everyone talks about agents. Nobody talks about the plumbing.
As someone building agents daily — the real question is whether Mythos keeps Sonnet's reliability in tool-calling loops. Sonnet was the sweet spot because it was smart enough to follow complex instructions but fast enough to not break the budget. If Mythos trades speed for depth it might shift the calculus on multi-agent workflows.
Great breakdown. One thing that's still underappreciated: the inter-agent data layer. Tools/memory gets talked about, but how agents pass structured outputs to each other is where multi-agent systems actually live or die. The coordination protocol between agents matters as much as the agent itself.
@steipete@_ARahim_@bcherny Boomer here, guilty as charged. But there's an important exception: function names and API params. LLMs do NOT handle those typos gracefully.
Everyone's talking about 10x engineers. Nobody talks about the 0.1x bottleneck.
The one person who can't use AI. The single approval gate. The compliance checkbox that takes 3 weeks.
Your AI stack is only as fast as the slowest human in the loop.
The real question isn't whether a solo founder can build a billion-dollar company with AI — they can and they will. The real question is whether they can build the organizational muscle that makes it defensible. Code is cheap now. Distribution, trust, and operational moats are not.
Hot take: a "meta-agent that infers your vibe" is just another loop with a thicker abstraction layer. The hard part isn't writing the loop — it's defining the termination conditions, error recovery, and state handoff between iterations.
Vibe inference is great for demos. Production needs explicit guardrails. The less ambiguity you leave in the loop, the fewer late-night "why did the agent do that" moments.
The bottleneck shifted from "can I build this?" to "should I build this?"
Code is cheap now. The hard part is figuring out what actually solves a real problem, who to sell it to first, and how to get them to care.
Building Markus taught me: execution speed went 10x, but decision quality didn't. That's the new frontier.