Everyone's talking about harnesses. None of them solve the core problem. I built one 9 months ago that does.
Artifacts > Vibes. Progress only on Proof of Work
LLM inference is unreliable, inconsistent, and degrades the longer it runs. You cannot fix that with with a loop that doesn't address the underlying problem. You fix it by orchestrating from the kernel of work, the task. Atomic tasks. Intra-task QA w correction cycles. Audit by default. Trust nothing. Verify everything.
I call it the PABLOV method. Programmatic Artifact-Based LLM Output Verification.
Claude code, Codex, Cursor, Temporal, n8n, Aider, Devin, Lovable, Replit, OpenHands, Kiro, etc. all fall over on long-haul lifts. I built a programmatic orchestrator last year on the PABLOV method that runs at high accuracy for 100+ hours straight.
So many talking farms posting about LLM providers profitability, bubbles, and yada yada. Its all b8
Uber founded 2009
Uber >$1B rev 2014
Uber profitable 2023
Your harness is your Agent OS. and, it matters more than the OS your machine runs on.
The lock-in on your Agent OS will be far harder to break than any OS lock-in before it. The cause is the ability and freedom to customize. Every prompt you tune, tool you wire, workflow you encode raises the cost or leaving. Every tweak welds you in tighter.
Unlike the OS era where the whole world ran on three options, there will be thousands of harnesses tuned to teams, roles, individuals.
New market. Wide open. Most are arguing about which model is more capable. The opportunity is one layer up.
@pmarca Def changes what makes a good software engineer and what they spend their time doing. Most trad software devs are not good in areas required to orchestrate unlimited agent instances
@LeakerApple I remember 9 months ago trying to tell some PC bois that their GPUs cannot compete with Apple’s unified memory for local inference. Engrained