Barry Zhang, Anthropic:
"Don't build agents for everything."
the post is tired of optimizing prompts and reaches for a workflow, and the Anthropic engineer behind that exact framework says the quiet part: most of the time you don't even need an agent. a workflow is a fixed path where each step hands a file to the next. an agent decides its own path. the post's whole fix, handoff files, decision gates, prompt chaining, is the workflow half, and it's the half that actually ships.
prompting hit a ceiling because it was never an architecture. you were doing the coordination by hand, copy-pasting between tabs. the move isn't a smarter prompt or a fancier agent. it's a boring pipeline so simple it's embarrassing.
so start there. one repeating task, two steps, each writing its output to a file the next one reads. don't build an agent for everything. build the workflow for the one thing you do every week.
Boris Cherny, creator and head of Claude Code at Anthropic:
"We're starting to get to the point where it has ideas. It's figuring out what to build next."
the post's whole thesis is that the human stops being the loop. Cherny, the person who built Claude Code, is describing the far end of that: the loop no longer just executes your goal, it starts proposing the goal. it reads the repo, it reads GitHub, and it decides what to build next.
that's the post's "discover" stage taken to the limit. a prompt engineer hands the agent a task. a loop engineer hands it a job. and at the frontier Cherny is describing, the system starts generating its own.
but the honest half of the post still holds: a loop that picks its own work with no real verifier is just confident drift, at speed. the only reason "it has ideas" is useful instead of dangerous is a gate that fails bad output and a human who still owns the decision. design the check, then let it think.
Boris Cherny, creator and head of Claude Code at Anthropic:
"We're starting to get to the point where it has ideas. It's figuring out what to build next."
the post's whole thesis is that the human stops being the loop. Cherny, the person who built Claude Code, is describing the far end of that: the loop no longer just executes your goal, it starts proposing the goal. it reads the repo, it reads GitHub, and it decides what to build next.
that's the post's "discover" stage taken to the limit. a prompt engineer hands the agent a task. a loop engineer hands it a job. and at the frontier Cherny is describing, the system starts generating its own.
but the honest half of the post still holds: a loop that picks its own work with no real verifier is just confident drift, at speed. the only reason "it has ideas" is useful instead of dangerous is a gate that fails bad output and a human who still owns the decision. design the check, then let it think.
"No human ever deliberately sets those parameters."
the post lists 20 concepts to understand AI. this 7-minute video is the fastest way to feel the one that matters most: nobody programmed the model's behavior. a model is hundreds of billions of dials, tuned automatically by predicting the next word across trillions of examples. people design the training, not the result.
that's exactly why "understand how AI works" beats "memorize prompt tricks." once you get that it's an emergent next-word predictor nobody hand-wrote, the rest of the list clicks: why it hallucinates, why context matters, why it feels brilliant one minute and dumb the next. it isn't looking anything up. it's predicting.
so the edge isn't more buzzwords. it's knowing that even the people who built it can't fully explain why it answers the way it does. understand that, and you stop mistaking confidence for truth.
"No human ever deliberately sets those parameters."
the post lists 20 concepts to understand AI. this 7-minute video is the fastest way to feel the one that matters most: nobody programmed the model's behavior. a model is hundreds of billions of dials, tuned automatically by predicting the next word across trillions of examples. people design the training, not the result.
that's exactly why "understand how AI works" beats "memorize prompt tricks." once you get that it's an emergent next-word predictor nobody hand-wrote, the rest of the list clicks: why it hallucinates, why context matters, why it feels brilliant one minute and dumb the next. it isn't looking anything up. it's predicting.
so the edge isn't more buzzwords. it's knowing that even the people who built it can't fully explain why it answers the way it does. understand that, and you stop mistaking confidence for truth.
Nate Herkelman:
"I've genuinely never seen workflows scale like this before."
the hype line is real, but the post above is the antidote to it. 40 "make money while you sleep" workflows, and the one sentence that keeps it honest: the system runs the labor, you still need something to sell and someone to sell it to. nobody gets rich because a cron job fired at 3am.
what actually prints is stacking. one workflow is a tool. an opportunity scanner feeding a report feeding outreach feeding follow-up is a business, and you're the only human in the loop. that's where a $10,000 workflow comes from, not from one magic prompt.
so don't build all 40. build the 1 closest to something you already sell, run it by hand until it's boring, then automate it. the money was never in the model. it's in having something worth automating.
Boris Cherny, creator of Claude Code:
"This morning I was managing maybe a few hundred agents."
the post is a 14-step roadmap for going from prompting one agent by hand to designing loops that run fleets of them. Cherny is just describing the destination. when the guy who built Claude Code says he managed a few hundred agents before lunch, and tens of thousands on some days, "loop engineering" stops being a buzzword and turns into a job description.
but read the post's honest half before you chase that number. a fleet with no verifier is a few hundred agents agreeing with themselves. the only reason he can run that many is the gate, the test that fails bad work without him watching, plus a human who still reads the diffs. volume without verification is just being wrong faster, in parallel.
so yes, the leverage moved: from typing prompts to designing the system that prompts. but the engineer didn't disappear. he moved up a floor, from writing the code to owning the loop that writes it.
Boris Cherny, creator of Claude Code:
"This morning I was managing maybe a few hundred agents."
the post is a 14-step roadmap for going from prompting one agent by hand to designing loops that run fleets of them. Cherny is just describing the destination. when the guy who built Claude Code says he managed a few hundred agents before lunch, and tens of thousands on some days, "loop engineering" stops being a buzzword and turns into a job description.
but read the post's honest half before you chase that number. a fleet with no verifier is a few hundred agents agreeing with themselves. the only reason he can run that many is the gate, the test that fails bad work without him watching, plus a human who still reads the diffs. volume without verification is just being wrong faster, in parallel.
so yes, the leverage moved: from typing prompts to designing the system that prompts. but the engineer didn't disappear. he moved up a floor, from writing the code to owning the loop that writes it.
Geoffrey Huntley, creator of the "ralph" loop:
"software development is dead. I killed it."
it sounds like rage-bait until you see what he actually built. the "ralph" loop is a dumb bash while-loop that hands an agent a goal and a check, then lets it run itself until the check passes. that is the entire loops thread in one technique.
he didn't kill engineering. he killed the typing. and the part everyone skips is the part that did it: the check. without a verifier that can fail the work, you don't have a loop, you have an agent congratulating itself. the loop is the easy half. the gate is the job.
so "stop prompting" isn't a vibe. it's a move: define "done" so precisely a machine can enforce it, press go, walk away. that's what killed the old way of working.
Maya Nielan, Anthropic:
"Steerability and Human-in-the-loop." (11:53)
The whole @Av1dlive orchestration guide ends where her talk does: at a human checkpoint. you can spin up 1,000 agents, route models by blast radius, compile coordination into code, and none of it counts if nothing verifies "done." agents produce plausible output faster than you can read it, and plausible is not correct.
She names the lever everyone skips while chasing agent counts: steerability. a proactive agent that acts on its own is only safe because a human, or a disinterested judge, sits in the loop with a veto. the post says the same thing from the engineering side: put the judge on a different model family, cap the loop, keep a human gate on anything irreversible.
So the skill was never summoning the fleet. it's staying steerable while it runs. orchestration with no checkpoint is just being wrong at scale, in parallel.
Alex Ziskind, software engineer:
"I plugged a DGX Spark and a Mac together... and didn't expect this."
he wired a $2,999 DGX Spark to a Mac to run one model across both boxes. the experiment is the honest counterweight to the thread above. yes, local AI is real now: a $249 jetson or a $600 mac mini covers the everyday 80%. but the moment you want a big model at full speed, you feel the ceiling, and people start bolting machines together to climb it.
that's the actual decision the post points at. for drafting, summarizing, code completion, classification, a cheap box beats the subscription in 2-3 months and never sends your data anywhere. for frontier-size models at frontier speed, you either rent the cloud or build a rig.
so the move isn't "cancel everything." it's "move the 80% home, rent the 20%." own the work you do every day. that was never really the part about price.
A chinese dev quietly found a way to run a 16-agent team that ships while he sleeps. the wild part is the one rule he built it around.
he didn't add more agents. he added one that does nothing but check the others. the orchestrator splits the work, specialists produce, and a single critic scores every output against a fixed standard before anything ships. pull that critic out and a 16-agent team is just 16 confident ways to be wrong, in parallel.
and the playbook isn't even secret. Nate Herk walks the whole build in 16 minutes, and the chapter that actually matters is the pitfalls at 13:12. the post above names the exact same trap: a team that only produces and never reviews ships fast, confident garbage.
so don't start with the org chart. start with the 1 agent that checks the others. that's the line between a demo and a system that runs without you.
Simon Coton:
"I Built The Best Claude Memory System (Beats Hermes)"
3 jobs, and most CLAUDE.md files do exactly 1. they store. then they balloon past 300 lines, dump all of it every session, and the agent still misses the one rule that mattered.
storage is only step 1. injection loads the right slice at the start of the run. recall pulls a fact back the moment it matters mid-task. miss either and your memory file is a diary nobody opens at the right page.
the thread you're quoting nails the storage half: 1 CLAUDE.md the agent reads, 1 AGENTS.md your other tools read, an auto-memory it writes itself. this video is the other half, the part that decides whether any of it shows up when you actually need it.
so stop counting your setup in files. count it in jobs. 5 files doing 1 job is still 1 job.