🚨 A SENIOR ANTHROPIC ENGINEER JUST DROPPED AN 11-PAGE PDF ON LOOP ENGINEERING.
The core shift: stop prompting the agent. Build the system that prompts it.
Inside the autonomous loop:
- Discover → Finds its own work (failing CI, open issues).
- Isolate → Uses separate git worktrees to prevent collisions.
- Verify → A second agent reviews the work. (Never let agents self-grade).
- Persist → Writes to disk, not temporary context windows.
- Schedule → Runs automatically on a timer.
This is a great framework for building more reliable agentic systems
link to the guide below.
Read it, then check out this ace article on Loop Engineering by @akshay_pachaar 👇
An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents.
Anthropic's own skill-creator hits 34% pass rate. This framework hits 71%.
Generate → Test → Verify → Co-Evolve
> Generate: after every task failure, the agent writes a candidate skill for what just broke.
> Test: the new skill runs on a held-out set with the same frozen Claude model.
> Verify: if it scores higher than the current best, it gets promoted. If not, it's rejected and the failure is logged.
> Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving.
The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic's own skill-creator by 37 points across SkillsBench and Codex.
This is exactly why engineers stopped writing skills by hand and let the agent evolve them.
Read the paper, then grab the setup below.
A Stanford team just published the 16-page PDF on “How to structure an AI agent”
Structure matters more than how you prompt it, and it's backed by hard numbers.
Build → Reflect → Curate → Reuse
• Build: the agent starts with a structured context, not a clever one-off prompt.
• Reflect: it watches what actually worked during execution, no labels needed.
• Curate: it folds those wins into an evolving playbook instead of a static prompt.
• Reuse: the next run starts from that refined structure, getting stronger each time.
This is exactly why senior engineers build the structure first in Claude Code, then let the agent run.
Read the paper, then grab the setup below 👇
A senior Anthropic engineer just published the clearest blueprint on "How to give your AI agent a real memory" and it's a 15-page PDF.
Write → Consolidate → Recall → Apply
• Write: after every attempt, the agent records what it tried and what happened.
• Consolidate: it distills those raw attempts into a few reusable lessons, not a transcript dump.
• Recall: before the next task, it reads those lessons first.
• Apply: it skips the dead ends it already learned, even on a brand new problem.
This is exactly how engineers now build agent loops in Claude Code.
Read the paper, then grab the setup below 👇
Anthropic posted the best prompting lecture I've ever seen... and deleted it two days later.
I watched the recording last night and kept pausing it. Each time I opened Claude to test what they showed.
Two Anthropic engineers showed in 24 minutes how the Claude team actually uses it.
Not tips. Not hacks. The way they actually talk to Claude. Every day. For real work.
After 3 minutes you'll want to rewrite every prompt you've ever sent.
I recently spent a month in Asia, including 10 days in China, where I met with senior policy makers in several countries, and I found that over the past few months, there has been a big shift in the world order. I share my perspective in my latest article.
As always, I welcome your questions and thoughts.
A senior Google engineer just dropped a 19-page PDF on "Loop Engineering" for LLM and agentic systems.
Act → Observe → Learn → Repeat
• Act: the LLM proposes a code transformation (tile this loop, parallelize that one).
• Observe: a compiler runs it and reports back - is it valid? faster? slower? by how much?
• Learn: the LLM reads that feedback and adjusts its next move.
• Repeat until it stops finding improvements.
The agent gets smarter purely from grounded feedback inside its own context window.
This 19-page PDF totally changed the way I’m building agentic systems today.
Read it now, then explore the article below.
the four pillars of loop engineering.
the loop itself is six lines, and nobody competes on it. every serious agent framework lands on the same tiny while-loop. model reads context, calls a tool, you feed the result back, repeat until it stops asking.
so if that part is solved, what is everyone actually engineering?
the answer is everything around the model. Boris Cherny, who built Claude Code, put it plainly. he doesn't prompt Claude anymore, he writes loops and lets them run.
that shift has a name now, and it rests on four pillars that are harder than the six lines make them look. these are the parts that actually break:
→ knowing when to stop. a terminal message ends the turn, not the task. an agent will write failing code, glance around, and declare victory. "done" has to mean the tests pass, not the agent feeling good about its work.
→ keeping the context clean. long loops rot from the inside as old outputs and dead ends pile up. a worse context produces a worse decision, which adds more noise, and the agent gets dumber the longer it runs. you fight it by treating context as a budget, not a bucket.
→ tools the agent can actually use. pile on a hundred tools and it loses track of which one to reach for. writes have to be safe to repeat, because loops retry, and a retried "create customer" call leaves you with duplicate records.
→ something that can say no. left alone, an agent agrees with itself. the fix is to separate the maker from the checker so the worker never grades its own homework.
put those four together and your job changes. you stop steering the agent move by move and start designing the system that steers it.
Karpathy runs research loops overnight that tweak a script, test it, keep what works, and throw away what doesn't, with himself nowhere in the loop. he arranges it once and hits go.
the model is becoming a commodity. the loop around it is where the real engineering lives now.
the best builders stopped asking what they should tell the agent to do. they started asking what system would do this without them.
I wrote the full breakdown. the article is quoted below.
stay tuned for more on this!