Karpathy said something you'll regret ignoring:
"Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf."
Loop engineering is the exact thing that does that.
In a hand-run session, the operator handles two things:
- deciding what the agent runs next
- and checking its output before the next step
Both are manual, and both decide how far the agent gets on its own without the operator.
Loop engineering moves both steps into the system.
A core operating structure surrounds the loop, and the diagram below depicts it.
- A schedule decides what to run
- Loop is the maker that produces the work
- A separate checker agent grades the output
- A file on disk holds the state they both read.
The loop runs until either done, max iterations, or an exhausted budget.
Here are some practical engineering considerations:
1) A model grading its own output justifies what it already did instead of catching where it failed.
That's why a separate checker's findings return to the maker as the next instruction. And the cycle repeats until the checker finds nothing left to fix.
2) A loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs add up.
That's why the exit must be set before the loop runs, not while it is running.
A simple exit could be:
↳ fix only the major issues, run one final pass, and stop after two loops, with "all tests pass and lint clean" as the rule that ends it.
3) State has to live on disk, not in context.
The model forgets everything between runs, so an MD file or a knowledge graph holds what is done and what is still open.
Each run reads it and writes back to it, which lets a loop pick up again after days.
4) The lower the verification bar, the safer the loop.
Boring, repetitive checks like a stale version string or a missing test are trivial to verify, so a loop runs them with little risk while the operator is away.
Judgment-heavy work is loopable too, but only as far as the checker can confirm the result.
Let's look at how an unattended loop fails in two ways.
1) It reports done when nothing is actually verified.
The separate checker exists to prevent it, but it merges code faster than anyone reads it, so over weeks, the team stops understanding its own codebase while every check stays green.
Green tests say the code passed the tests, not that anyone knows what shipped. Someone still has to read what the loop merges.
2) The checker keeps a running loop honest, but it only catches failures inside a run.
The harness around the loop, like the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change.
That repair loop is usually run by hand based on observability traces.
My co-founder wrote a detailed walkthrough (with code) on making that harness repair itself, where a failing trace gets diagnosed, the fix is verified against the exact input that failed, and the failure is locked as a regression test so it cannot recur.
Read it below.
NVIDIA might just have open-sourced one of the most important AI projects right now.
everyone is building skills, and we are also pulling in skills other people wrote and downloading them straight off GitHub.
the skill is not just text. it bundles instructions and real executable code, and your agent runs that code with the same access you have.
so a skill you grabbed to save ten minutes can read your environment variables, lift your API keys, and quietly send them somewhere. recent research found roughly 1 in 4 public skills carry a vulnerability, and a smaller slice are outright malicious.
that is the gap SkillSpector closes. it is a security scanner that answers one question before you install anything: is this skill safe to run.
you point it at a skill, and a local folder, a single skill .md file, a GitHub link, or a zip all work.
it then runs two passes over the code. a fast static pass flags risky patterns like credential harvesting, data leaks, and prompt injection, and checks the dependencies against live cve data.
an optional second pass uses an LLM to read intent and clear out false positives.
at the end you get one risk score from 0 to 100 and a plain verdict that reads as safe, caution, or do not install.
it is open source under Apache 2.0 and scans skills for Claude Code, Codex CLI, and Gemini.
worth a run before you trust the next skill you find online.
link to the GitHub repo: https://t.co/iaPlOvQ3t4
🚨Anthropic just showed a 24-minute workshop on how to actually do prompts for Claude.
Taught by the people who built it.
Free. No registration. No paywall.
I've seen $300 courses that don't cover what they teach in the first 8 minutes.
Watch it and bookmark it now.
Taiwan Officially Enters the Stablecoin Era
台灣官方正式進入穩定幣時代
Taiwan Stock Exchange Issues "Guidelines for Accounting Treatment of Stablecoin Transactions" and "Guidelines for Internal Control Systems for Holding Cryptocurrencies"
The documents formally state that when enterprises hold stablecoins, they should use current regulations and the latest user terms published by stablecoin issuers as the basis for determining accounting classification. Four case studies were written based on common transaction patterns of USDC and USDT.
The guidelines also provide detailed explanations on how to determine whether a stablecoin holder has a contractual right to receive cash or other financial assets under different scenarios—where stablecoins are either legally regulated or not yet regulated—thereby classifying stablecoins as financial assets, intangible assets, or inventory.