Harness engineering partially takes humans out of the loop. Loop engineering aims to free humans from the agent work loop entirely.
After Harness launched this February, it sparked a wave of experimentation. At a relatively low cost, we were able to raise the quality of agent task execution from around 60% to 90%.
But one bottleneck remains: humans still need to stay in the loop to guide and adjust the agent’s work.
Loop engineering solves this by introducing 5 functions: automations, subagents, worktrees, state, skills.
When I read Addy’s work, I saw 2 trends beyond technology:
1. From context, to harness, to loop, the amount of code humans need to write keeps shrinking. Agents are destined to surpass every individual human programmer, and coding ability will become increasingly commoditized.
“Code is cheap. Give me the talk.” The ability to tell a compelling story about your product is what will distinguish your work.
2. Once humans are fully freed from the work loop, higher-level and “softer” capabilities will become even more important, for instance, strategic thinking and creativity.
#harness #loop #coding
🖱️ CVPR Day 2 | Video Generation, Explainable AI, and Agent Security
Bookmarking the highlights of the day:
•Agent self-evolution might be a false premise - self-play is the bottleneck.
•Caught an oral on safety, which reminded me of a recent post by an OpenAI researcher on X: the benchmarks we currently rely on capture only a drop in the ocean of an agent’s or model’s true potential. This also matters for AI safety - genuinely dangerous capabilities may only surface under high compute budgets, yet we’re still probing them in low-compute regimes. With fully autonomous agents on the verge of becoming real, security only grows more critical. The bar for safety evaluation needs to be raised.
•Research feels more like an infinite game - its purpose is to keep progress going, indefinitely.
Also genuinely happy to have met so many new people at the @physianlab event.
#cvpr2026 #worldmodel #cvpr #robot
📟 CVPR day1 | Poster, Booths, and Nvidia Researcher Inception
A few things stood out from the poster session:
>> VLM hallucination may not only be a model problem. One interesting angle is to treat it as an agent problem: add an evaluator-optimizer loop that can check and critique throughout model processing.
>> Knowledge graphs for autonomous driving seem promising as a structured layer between perception and decision-making. “Contextual capabilities” matter.
>> World models kept coming up in conversations. The question is not only how systems see the world, but how they represent it internally, predict what may happen next, and update when reality disagrees.
More than the research, I really enjoyed meeting fellow researchers at the NVIDIA Researcher Celebration and exchanging thoughts around world models.
Also glad to run into UMich alum here.
#cvpr #worldmodel #cvpr2026
If language is an abstraction of the world, and pixels a mapping of it, then matter and geometry are maybe the world itself.
Simulator has been underrated, but it might be the one worth receiving more attention on among the three world model functions - renderer, simulator, planner. It can be the bridge: it derives the renderer’s visual appearance and carries the planner’s consequence, the structural backbone underneath both. A truly unified world model has to do all three, fluently.
The piece also discusses “renderer”: it has the technology, it has the market. When reading this, I cannot help to think: does mature mean most worth researching? Research and industry should move together, sure, but sometimes research has to dare to diverge.
We need a unified world model, the world model needs the world, but maybe the world model was never meant to be built only for humans to look at
If language is an abstraction of the world, and pixels a mapping of it, then matter and geometry are maybe the world itself.
Simulator has been underrated, but it might be the one worth receiving more attention on among the three world model functions - renderer, simulator, planner. It can be the bridge: it derives the renderer’s visual appearance and carries the planner’s consequence, the structural backbone underneath both. A truly unified world model has to do all three, fluently.
The piece also discusses “renderer”: it has the technology, it has the market. When reading this, I cannot help to think: does mature mean most worth researching? Research and industry should move together, sure, but sometimes research has to dare to diverge.
We need a unified world model, the world model needs the world, but maybe the world model was never meant to be built only for humans to look at.
Calling SF female founders
This event is just for you✨
On 11th June, we will invite early stage vc and have a pitch session.
If you are cooking something big, welcome to share your stories!
Link: https://t.co/yg1mubLE0S
Just got back from EF’s @OpenAI business hackathon - a ton of sick builders and some real signals:
•Agents for business are about seamless embedding into everyday production
•Robotic data curation is a big trend, but only a few startups are making real progress
•Sense of “timing” matters - good time to get skin in the game on hardware, or data for hardware
Also, what a coincidence to bump into builders from um 🔵🟡
#SanFrancisco #Hackathon #OpenAI #builders