Ex-Google engineer reveals how to build AI agent loops, harnesses, LLM ops, and evals in 19 minutes.
Trace → evaluate → diagnose → fix → ship → repeat.
That loop is how agents self-improve over time.
Agentic loops + harness + memory + evals - that’s the senior engineer stack.
This is better than $500 paid courses on the same topic, explained in under 20 minutes.
Watch it, then save the framework below.
KARPATHY JUST HANDED EVERY DEVELOPER THE EXACT FILE CLAUDE CODE NEEDED FROM DAY ONE.
65 lines. 110K stars. the cheat code for every broken workflow you've been blaming on the model.
if I had this a year ago, I would've shipped twice as fast.
make sure to bookmark it before it gets lost in your feed.
I was losing 2 hours a day to Claude rewriting code I didn't ask it to touch.
then I found CLAUDE. md.
90 seconds to set up. changed everything.
Karpathy identified 4 failure patterns Claude Code repeats constantly, in his own words:
→ silent assumptions: Claude makes decisions without checking with you
→ code bloat: 1000 lines written when 100 would do
→ collateral damage: Claude edits code unrelated to the task
→ no success criteria: Claude loops with no finish line
these aren't model failures. they're missing instructions.
CLAUDE. md gives Claude the 4 rules it needed from day one:
→ think before coding, state assumptions. ask before assuming.
→ simplicity first, minimum code. nothing speculative.
→ surgical changes, touch only what is required. nothing adjacent.
→ goal-driven execution, define success before starting. loop until verified.
65 lines. no build step. no framework. no dependencies.
just the 4 principles every developer already knew, but needed Karpathy to write down.
ClaudeKit is the only team you need to build something like this (https://t.co/aW94IVKzGN)
the guide on how to learn Claude below (every resource you need)
https://t.co/8pon0x7bcI
Claude Code can now upload and edit HTML artifacts that you can share with your team or other Claudes!
Starting with teams so you can share internally with your team, coming to Pro and MAX plans soon!
what is agent looping
for the last two years we prompted agents one task at a time. that is starting to change
instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met
looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up
at its simplest, looping is one agent working on itself:
> researches
> drafts
> checks the draft against a goal
> fixes what is weak
> runs that cycle again until the work clears the requirements
you are not prompting each step anymore. the agent repeats the cycle for you
the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents
the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met
one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end
you create a goal, and the system runs the loop until it finishes within the reqs you set
open and closed looping:
OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out
this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time
the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine
CLOSED LOOPING is bounded. a human designs the end-to-end path first:
> clear goal
> defined steps
> an eval at each step
> a point where it stops or hands back to you (and feeds back performance data)
the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight.
for most marketing work, closed is the one that pays off today.
> the orchestrator owns the goal
> the specialists own the steps
> the subagents do the narrow work
> an eval gate make sure its not slop
AI coding agents can now extract any website's design system directly from the terminal.
I love small projects like this that Hyperbrowser creates. Their repo is full of them.
Folks: when you write skills, ask your agent to be token efficient, relax grammer. I see too many skills that write books in the skill description, and all that crap is loaded into every context.
I wrote a skill that finds the worst offenders. https://t.co/kfaaJpxMXE
Anthropic just paid millions to hire Andrej Karpathy.
He gave you the same knowledge for $0 the same week.
Co-founder of OpenAI. Former head of AI at Tesla. The man who coined vibe coding.
No recruitment fee. No exclusive access. Just a link and 29 minutes.
LLMs are ghosts not animals.
Vibe coding is dead.
Software 3.0 is here.
Watch it.
Then read this.
Because Karpathy tells you what Software 3.0 is.
This shows you how to build one - a software factory with Claude Code that ships features while you sleep.
The full build guide is below.
karpathy's CLAUDE.md hit #1 on github trending.
220,000 stars. most devs still haven't read it.
it's 65 lines.
it took AI coding accuracy from 65% to 94%.
the 4 rules inside:
→ think before coding
state your assumptions. ask when unsure. never guess.
→ simplicity first
write the minimum code that solves the problem.
no abstractions nobody asked for.
→ surgical changes
don't touch code unrelated to the request.
every changed line must trace back to what was asked.
→ goal-driven execution
turn vague instructions into verifiable success criteria
before writing a single line.
that's it.
65 lines. 4 rules. 94% accuracy.
save this before everyone else does.
Live from Code with Claude London: we're launching self-hosted sandboxes (public beta) and MCP tunnels (research preview) in Claude Managed Agents.
Run agents inside your own perimeter, with your security controls applied by default.
Boris Cherny, the creator of Claude Code at Anthropic, just explained how to write prompts that actually work
CLAUDE.md files, memory shortcuts, parallel sessions, and prompting patterns all in one video and completely free
ANTHROPIC JUST KILLED THE DEMO AGENT ERA.
Their Agents team showed exactly what production grade looks like.
Not theory. Not a tutorial. A four layer framework for multi agent systems built to actually work in the real world.
30 minutes.
This is the video I wish existed 6 months ago.
GitHub has just solved the biggest problem with vibe coding.
They just dropped Spec Kit and it exploded to 95K+ stars within days.
The reason is simple:
Most AI coding agents jump straight into writing code before actually understanding the project.
That’s why you get random architecture decisions, broken flows, inconsistent files, and hours wasted debugging things you never asked for.
Spec Kit changes the workflow completely.
Instead of coding first…
the AI is forced to think first.
It creates a full project specification before touching a single file.
So the agent understands:
* what you're building
* the constraints
* missing details
* architecture decisions
* implementation order
before execution even starts.
The workflow looks like this:
→ /constitution = coding rules + standards
→ /specify = define the product
→ /clarify = resolve ambiguities
→ /plan = architecture + tech stack
→ /tasks = execution breakdown
→ /implement = build phase
Result?
Far fewer hallucinations.
Cleaner codebases.
More reliable outputs.
Way better collaboration between humans and AI agents.
Works with:
Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, and 25+ other agents.
Open source.
95K+ stars.
8K+ forks.
Built by GitHub.
Repository 👇
https://t.co/b9g5ojqsua
Instead of watching Netflix tonight, watch this 2-hour Stanford lecture.
It will teach you more about how LLMs like ChatGPT and Claude are actually built than most people learn in years working in AI.
Stanford released it for free.
Save this.
RAG vs. CAG, clearly explained!
RAG is great, but it has a major problem:
Every query hits the vector DB. Even for static information that hasn't changed in months.
This is expensive, slow, and unnecessary.
Cache-Augmented Generation (CAG) addresses this issue by enabling the model to "remember" static information directly in its key-value (KV) memory.
In fact, you can combine RAG and CAG for the best of both worlds.
Here's how it works:
RAG + CAG splits your knowledge into two layers:
↳ Static data (policies, documentation) gets cached once in the model's KV memory
↳ Dynamic data (recent updates, live documents) gets fetched via retrieval
This gives faster inference, lower costs, and less redundancy.
The trick is being selective about what you cache.
Only cache static, high-value knowledge that rarely changes. If you cache everything, you'll hit context limits. Separating "cold" (cacheable) and "hot" (retrievable) data keeps this system reliable.
You can start today. OpenAI and Anthropic already support prompt caching in their APIs.
I have shared my recent article on prompt caching below if you want to dive deeper.
Have you tried CAG in production yet?
Below, I have quoted an article that I wrote on prompt cashing and how Claude Code achieves a 92% cache hit-rate. Give it a read.
Spotify's Chief Architect just showed how they ship 4,5K deployments /day with Claude at Anthropic stage
27-minutes. free. By #1 music app dev
"More than 99% of our engineers use AI coding tools. Adoption took off after Opus 4.5"
Worth more than any $500 vibe-coding course.
Google is firing shots at Canva. 😭
Google just launched Google Pics, a new AI image creation + editing tool.
> create posters, infographics & social posts
> understands every element inside an image
> erase, move & resize objects with prompts
> translate text inside images instantly
Design software is becoming AI-native FAST.
Karpathy's "second brain" concept in 60 seconds:
1. Three folders (raw, wiki, outputs). That's the whole architecture.
2. One CLAUDE.md schema file tells the AI how to organize everything.
3. Dump your bookmarks, notes, and articles into raw/. Don't organize them.
4. One prompt: "Compile a wiki from raw/ following CLAUDE.md." Walk away.
5. Ask questions against your wiki. Save answers back. It compounds.
6. Monthly health check catches errors before they stack.
No Obsidian or complex plugins. Just desktop folders and a schema file.
Full walkthrough + free skill that builds it for you in 60 seconds in the article below.
My biggest takeaways from @AnthropicAI's Head of Growth Amol Avasare:
1. Engineering is getting the most AI leverage—and it’s squeezing PMs and designers. With Claude Code, a five-engineer team now produces the output of 15 to 20 engineers. But PM and design productivity haven’t scaled proportionally. The result is a compressed ratio where one PM is effectively managing the output of a much larger engineering team. Anthropic's growth team is responding in two ways: hiring even more PMs (!), and formally deputizing product-minded engineers to act as mini-PMs for any project with less than two weeks of engineering time.
2. Anthropic is using Claude to automate its own growth. The internal initiative is called CASH (Claude Accelerates Sustainable Hypergrowth). It works across four stages: identifying opportunities, building features, testing quality, and analyzing results. Right now it handles copy changes and minor UI tweaks. The win rate is comparable to a junior PM with two to three years of experience, and improving rapidly.
3. The one part of PM work that AI can’t automate yet: getting six people in a room to agree. Amol and his head of design joke that even with AGI, it’ll still be impossible to align six stakeholders. Cross-functional coordination—managing opinions, navigating politics, mediating tradeoffs—remains the bottleneck that AI doesn’t touch for larger projects. This is why Amol believes PM roles aren’t going away, and may actually grow.
4. 60-80% of Anthropic’s growth team's projects have no PRD. For smaller work, kickoffs happen on Slack—messages back and forth with product-minded engineers who can push back and ask the right questions. For larger projects, Amol believes in a proper 30-minute cross-functional kickoff (legal, safeguards, stakeholders) to surface concerns early.
5. Adding friction to onboarding drives growth—if the friction helps users understand why the product is for them. His work Mercury, MasterClass, Calm, and now Anthropic, adding steps to onboarding flows consistently improved conversion. The key: cut annoying friction that doesn’t add value, but add friction that helps users understand why the product is for them.
6. AI companies need to focus on bigger bets, not better A/B tests. Amol’s argument: if your core product value is driven by AI, then the future value is orders of magnitude higher than today’s value, because model capabilities grow exponentially. In that world, micro-optimizations capture a shrinking share of a growing pie. Traditional growth teams do 60% to 70% small optimizations and 20% to 30% big swings. At Anthropic, they flip this ratio.
7. Amol built a weekly AI agent that scans Slack for cross-functional misalignment. Using Cowork with the Slack MCP, he has a scheduled task that looks across his projects and conversations and surfaces areas where teams are about to do overlapping work or pull in different directions. A colleague on the enterprise team already caught major misalignment that would have caused weeks of wasted effort.
8. A traumatic brain injury taught Amol the principle that now drives his work: freedom through constraints. In early 2022, a kick to the head during a Muay Thai sparring session caused a traumatic brain injury. Amol spent nine months off work and months relearning to walk, unable to look at screens or listen to music for more than 20 seconds. He was re-injured a month after joining Mercury and had to take two more months off. He’s still not fully healed. But the constraints—no alcohol, no caffeine, mandatory breaks, daily meditation—have become the habits that let him operate at the intensity Anthropic demands. “The true freedom in life is learning how to be content when you don’t get what you want.”