Prompt, context, harness & loop engineering, clearly explained!
An agent is a while loop with four layers of engineering wrapped around it:
- Prompt engineering
- Context engineering
- Harness engineering
- Loop engineering
Each one wraps the last, and the model sits in the middle, so none of them compete with the others. Instead, they just zoom one level further out.
> Prompt engineering:
This defines the input the model sees on one call, often composed of a role, instructions, examples, and an output format.
The techniques here alter the internal computation and reasoning the model goes through due to the wording it sees:
- Chain-of-thought makes it work in steps before answering
- Few-shot examples define the format and the edge cases
- A JSON schema or XML tags make the output parseable by code
- Self-consistency samples a few chains and takes the majority
> Context engineering:
It's everything the model sees on a turn, not just the prompt. That includes the query, retrieved docs, memory, prior turns, and tool outputs from earlier steps.
The window is finite and fills up fast, so the engineering work is to rank inputs and cut everything that isn't pulling weight.
You do this by:
- Retrieving only the chunks relevant to the query, then reranking them
- Keeping key facts out of the middle, where accuracy drops
- Summarizing old turns, evict stale outputs, push big blobs to files
> Harness engineering:
It's the code around the model that defines the tools, parses the calls, retries on failure, and can route work to sub-agents so one handles retrieval and another handles code.
A verifier then grades the result by running tests, validating a schema, etc.
Prompt and context involve getting one call right. The harness involves everything that has to happen around that call for it to run in a real system.
> Loop engineering:
In the usual setup, you manage the outer loop, i.e, you write a prompt, read the turns the agent runs, write the next prompt, and repeat, while catching failures.
This layer hands that job to the agent itself. It kicks off on a schedule or an event, and runs many turns with no prompt in between.
A loop inherently doesn't know when it's finished. An agent can report that it's done and halt while the tests still fail. So the stop can't be the agent's word, but rather it has to be a real signal, like:
- A turn and token cap to stop stuck runs
- A no-progress detector to catch repeated calls
- A completion check to verify the goal with a separate model or a deterministic test
By this layer, you're operating on the whole run, so the engineering moves from writing each prompt to setting the goal and the stop conditions up front and letting it run.
If you want to dive deeper into loop engineering, my co-founder wrote a full breakdown of that outer loop.
It goes from the basic while loop to a run that finishes on its own, with the code behind each part, and the parts that are hard to get right, like knowing when to stop, context rot over a long run, and keeping the checker separate from the maker.
Read it below.
met an anthropic engineer making $1.2M a year.
asked him how he ships alone at the pace of a full team.
he didn't answer. sent me his .claude/. one folder.
SAME MODEL - DIFFERENT RESULT.
everyone's still picking between opus and sonnet like the model is the ceiling. it isn't.
the real lever is what the model wakes up into:
CLAUDE.md → hooks → verifier subagent → skills → mcp → memory → shift notes.
you stop chatting with the model.
you write the folder once. the folder runs the model.
- CLAUDE.md - the contract
- settings.json - the permissions
- hooks/ - the reflexes
- agents/verifier - the shift-notes cop
- skills/ - 33 muscle memories
- .mcp.json - the tools
- MEMORY.md - the shift log
that's the stack.
full breakdown in the article below. bookmark before he realizes i posted it.
Before Fable 5 goes live again, you need to read this.
Anthropic recently published its full guide to prompting Fable 5 for the best outputs.
Most people have no clue it exists, but it's a game-changer.
When Fable was originally launched, these are the principles I used: