about loop engineering.
everyone's saying the same thing this week. you don't prompt agents anymore, you design loops that prompt them.
here's the job that loop hands right back to you.
a loop running unattended is also a loop failing unattended.
loop engineering takes you off prompting. it takes you off curating context. it takes you off babysitting a single run. it does not take you off debugging. it just moves the debugging somewhere worse, into runs you were never watching, with far too much of it to read through by hand.
even the loop engineering posts admit this themselves, usually somewhere near the end. you can only walk away from a loop if you trust the thing checking it. a checker you don't trust drops you right back into reading every output by hand, which is the exact work the loop was supposed to take off you.
so stack the layers up, prompt, context, harness, loop, and one job survives all of them. closing the loop on failure. the leverage point moved. debugging stayed exactly where it was.
i was writing about this exact gap yesterday, before the loop talk picked up today. the idea was simple. make debugging its own loop. a failure leads to a root cause, a proposed fix, a rerun against the exact inputs that broke, and a test that locks it out for good. the checker gets built from your real failures instead of guessed at up front.
Opik, the tool i was writing about, does exactly this. a built-in agent reads the trace, finds the root cause, proposes a diff, you approve it, and that failure becomes a permanent regression test. every break you debug makes the loop a little harder to break next time, which is the kind of checker the loop engineering crowd keeps saying you need before you walk away.
if you're designing loops you actually plan to walk away from, it's worth a look.
Opik is 100% open-source under Apache-2.0 license.
GitHub repo: https://t.co/MEC26owCdo
(don't forget to star 🌟)
loop engineering moved the leverage point. it didn't remove the engineer who still has to close the loop when something breaks.
the full article, Your Agent Harness Should Repair Itself, is quoted below.
what is agent looping
for the last two years we prompted agents one task at a time. that is starting to change
instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met
looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up
at its simplest, looping is one agent working on itself:
> researches
> drafts
> checks the draft against a goal
> fixes what is weak
> runs that cycle again until the work clears the requirements
you are not prompting each step anymore. the agent repeats the cycle for you
the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents
the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met
one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end
you create a goal, and the system runs the loop until it finishes within the reqs you set
open and closed looping:
OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out
this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time
the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine
CLOSED LOOPING is bounded. a human designs the end-to-end path first:
> clear goal
> defined steps
> an eval at each step
> a point where it stops or hands back to you (and feeds back performance data)
the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight.
for most marketing work, closed is the one that pays off today.
> the orchestrator owns the goal
> the specialists own the steps
> the subagents do the narrow work
> an eval gate make sure its not slop
In this, you'll find a guide and an open-source repo to build your own self-improving agent loops
As well as a standardized system to build benchmarks and evals around the loop, for any workflow, so you can actually see the growth happening in real time
And put your agent to work even while you sleep.
The repo for building the system took months of work; it's not a weekend job, so save it and utilize it.
🚨a 22-year-old makes $8,217/month from an anime channel he built in one weekend
→ Claude: script and scene description. 10 minutes.
→ Midjourney: every frame. 20 minutes.
→ Runway: movement, breathing, camera. 15 minutes.
→ ElevenLabs: character voiced with emotional direction. 10 minutes.
→ Suno: score. 5 minutes.
→ Make: published Tuesday 9am. automatically.
$8,217 last month. 3 hours of work total.
the studios haven't figured out what to do about this.
full build with every prompt in the article above👇
Workflows are the biggest upgrade to Claude Code’s capabilities since skills and subagents.
I dove deep into it with @sidbid to figure out best practices, examples and more. I’m particularly excited about the non-technical tasks it enables for Claude Code.
2 months ago, I wrote "The Harness Is Everything" 1.3M views.
Last week's Life-Harness paper: 116 of 126 model-environment setups improved by patching the harness alone.
Model frozen. 88.5% mean lift across 18 backbones.
↓ how Claude Code and Codex actually work under the hood
KARPATHY WAS RIGHT. THIS 40-MINUTE Y COMBINATOR LECTURE PROVES IT
Karpathy said we're in the 1960s of AI - most people using Claude Opus 4.8 are still acting like it's just a search engine
> software 3.0 - LLMs as operating systems, not chatbots
> autonomous agents that run entire workflows without you watching
the 32 skills in this article are how you actually cross that line
bookmark this 👇
Claude opus 4.8 dropped and Anthropic released Boris Cherny's prompt workshop
One of the people actually building claude explains how to use it properly
Free
No signup
No paywall
The first 8 minutes are better than most $300 ai courses
Context
Task structure
Cleaner outputs
Fewer wasted prompts
Watch it and bookmark it before this gets sold back to you as a course
Anthropic CEO Dario Amodei:
"The cheapest way to use Claude is also the smartest. Most devs do the exact opposite."
In 36 minutes, he breaks down the real economics behind every Claude model, and why running them all the same way is a mistake.
Watch the full interview, then save the config below 👇
I use this to do my biweekly engineering updates for @HeyGen. It pulls my github activity, uses my avatar 5 with the heygen CLI (it can be created too), and renders the video with @HyperFrames_ CLI.
The whole thing runs from Claude Code/codex/hermes agent with one skill in the reply below 👇
🚨BREAKING: ChatGPT for marketing is here.
In one prompt, Fastlane can deploy hundreds of social media accounts, create viral content, and post it all automatically.
This is insane.
Anthropic just officially released the blueprint for creating a company with Claude Code and it's mind-blowing😭
CEO: 1 human (who sleeps)
Employees: several AIs
Activities: the AIs divide up the tasks and move forward on their own
Work is literally dying... I've summarized the full guide below, read it when you've got 5 min ⤵️
If you want the AI to work while you sleep → save this as a bookmark 🔖