the four pillars of loop engineering.
the loop itself is six lines, and nobody competes on it. every serious agent framework lands on the same tiny while-loop. model reads context, calls a tool, you feed the result back, repeat until it stops asking.
so if that part is solved, what is everyone actually engineering?
the answer is everything around the model. Boris Cherny, who built Claude Code, put it plainly. he doesn't prompt Claude anymore, he writes loops and lets them run.
that shift has a name now, and it rests on four pillars that are harder than the six lines make them look. these are the parts that actually break:
โ knowing when to stop. a terminal message ends the turn, not the task. an agent will write failing code, glance around, and declare victory. "done" has to mean the tests pass, not the agent feeling good about its work.
โ keeping the context clean. long loops rot from the inside as old outputs and dead ends pile up. a worse context produces a worse decision, which adds more noise, and the agent gets dumber the longer it runs. you fight it by treating context as a budget, not a bucket.
โ tools the agent can actually use. pile on a hundred tools and it loses track of which one to reach for. writes have to be safe to repeat, because loops retry, and a retried "create customer" call leaves you with duplicate records.
โ something that can say no. left alone, an agent agrees with itself. the fix is to separate the maker from the checker so the worker never grades its own homework.
put those four together and your job changes. you stop steering the agent move by move and start designing the system that steers it.
Karpathy runs research loops overnight that tweak a script, test it, keep what works, and throw away what doesn't, with himself nowhere in the loop. he arranges it once and hits go.
the model is becoming a commodity. the loop around it is where the real engineering lives now.
the best builders stopped asking what they should tell the agent to do. they started asking what system would do this without them.
I wrote the full breakdown. the article is quoted below.
stay tuned for more on this!
The guy who kicked off the entire "loop engineering" wave Peter Steinberger:
"You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."
One post. 6.5M views in a week.
In this talk he walks the real stack: the agent loop, a verifier that fails its own work and retries, and a loop that rewrites the agent while he sleeps.
Worth more than any $500 vibe-coding course.
Watch it, then read the full breakdown of the 4 loops below.
this is f*cking gold
How to build your first AI agent (Full guide)
if I had this a year ago, I would've shipped my first app in a day instead of 2 weeks
in the right hands, this changes everything:
A few months ago my kids started vibecoding little web games with Cursor and wanted their friends to play them. GitHub Pages was fine until the games needed real backends, so I hacked together a setup where each game was a folder in one repo that deployed to a Hetzner box on every push.
That held up until we shipped FULL SEND for Vibe Jam 2026 and it took off with 38,000+ players. The duct tape needed to become something real, so I rebuilt it properly and pulled it out into its own project.
It turns one Linux server into a push-to-deploy host for many apps. The whole thing is a single Go binary that installs and drives Docker, Kamal, Cloudflare, Tailscale, and GitHub for you. After that:
- Each app is a GitHub repo.
- A git push is live in <5 seconds.
- Deploys are zero-downtime.
- Each app runs in its own container.
- Automatic Cloudflare DNS and TLS tunnels.
- SQLite-aware backup and restore.
It's deliberately single server using convention over configuration, so for a typical app there's no YAML or Dockerfile to write. The idea is that one decent VPS can reliably run all your projects without per-app bills or piles of infra config.
It's built on top of Kamal, so it's basically a Kamal wrapper for the "lots of apps on one server" case, with the Cloudflare, Tailscale, DNS, and backup glue wired up by convention.
Setup is one interactive command on a fresh Linux box, which walks you through connecting everything.
If you also have a bunch of projects you want to run on a single server, tell your Claude Code, Codex, Cursor, or favorite AI agent to grab a VPS and try it for you. It's fully open source and you can customize it to your liking: https://t.co/ZvHZp55zso
Claude Code + YouTube = $62,000/Month
He leaked the exact system. Most people will scroll past it.
Nothing complicated.
Bookmark this so you donโt lose it.
If you want to get dangerously good at system design, learn these concepts:
1 Scalability
2 Availability
3 Reliability
4 Latency
5 Throughput
6 Database
7 SQL vs NoSQL
8 Load Balancing
9 Caching
10 Cache Invalidation
11 API Design
12 REST
13 GraphQL
14 gRPC
15 Authentication
16 Fault Tolerance
17 High Availability
18 CAP Theorem
19 Consistency Models
20 Replication
21 Erasure Coding
22 Consensus
23 Leader Election
24 Secrets Management
25 RBAC
26 Sharding
27 Indexing
28 Denormalization
29 ACID
30 BASE
31 Event-Driven
32 Message Queue
33 Pub/Sub
34 Sync vs Async
35 Idempotency
36 Bulkhead
37 Retry Logic
38 Timeout
39 Service Discovery
40 API Gateway
41 Blue-Green Deployment
42 Canary Release
43 Feature Flags
44 Observability
45 Logging
46 Correlation ID
47 Monitoring
48 Alerting
49 Full-Text Search
50 Time Series
(...and many more!)
What else should make this list?
===
๐ PS - Want a simple breakdown of each concept?
Read right now in my newsletter:
โ Part 1: https://t.co/u7BsUK307i
โ Part 2: https://t.co/CJAwmrUXdI
โ Part 3: https://t.co/DOQpnNOnjc
===
๐พ Save & RT to help others get good at system design.
๐ค Follow @systemdesignone + turn on notifications.
Andrej Karpathy spent 2h showing how he actually uses AI day to day
he's a co-founder of OpenAI and led AI at Tesla, so when he shows how he works, itโs worth watching
and the whole session is just him telling the machine what he wants in simple terms, like he's briefing a coworker
watch what's actually happening the entire time:
> he describes the task in normal words
> it goes off and does the work
> he glances at the result and nudges it with one more sentence
that's the whole skill, and you've had it since you learned to talk
the only gap between that and a worker that runs on its own is handing that sentence a schedule and the tools to act
check his work, then build the version that keeps working when you stop
Anthropic shipped 125 settings for Claude
The official docs cover 40.
One developer found the other 85.
His API bill dropped from $340 to $87.
Not by using a cheaper model.
Not by writing shorter prompts.
By moving one line in a config file to the right place.
> memory scoped per project โ past clients never bleed into new work
> Extended Thinking on Light by default โ 18โ25% fewer Opus tokens in week one
> cache_control moved to the right line โ the fix that turned a $340 bill into $87
> plugins and MCP servers toggled off when idle โ saved 25โ40K tokens per session
> per-project model override โ Haiku for docs, Sonnet for infra, Opus only where it matters
Same model. Same prompts. Same work.
Most Claude users are running a $100/month tool at 30% of its actual capability.
Here are 25 features, workflows, and tricks that close that gap โ
Bookmark this.
I can't believe how much I'm telling AI to output HTML reports and docs.
The results are fantastic.
Examples:
- Onboarding: "Generate an HTML overview of all the repositories in this folder and document how they relate to each other."
- Velocity report: "Generate a report on commits, PRs, and code churn over the last 8 weeks. Give me filters for repo and committer."