“Loop engineering” is a hot buzzphrase after mentions of it by Boris Cherny (Claude Code’s creator) and Peter Steinberger (OpenClaw's creator) went viral on social media. Loops are now a key part of how we get AI agents to iterate at length to build software. In this letter, I’d like to share my 3 key loops, shown in the image below, for building 0-to-1 products. These loops guide not just how I build software, but also how I decide what software to build.
Agentic coding loop: Given a product specification and optionally a set of evals (that is, a dataset against which to measure performance), we can have an AI agent write code, test its work, and keep iterating until the code is bug-free and meets its specification. This idea of closing the loop took off around the end of last year, and it has been a game changer in enabling coding agents to work longer productively without human intervention. For example, over the weekend, I was building an app for my daughter to practice typing, and my coding agent could easily work for around an hour, using a web browser to check what it had built multiple times before getting back to me, without needing my intervention.
The engineering loop executes quickly. Every few minutes, the coding agent might build and test a new version of the software. I hear frequently from developers who are finding new ways to engineer more effective engineering loops. This is an active area of invention!
Developer feedback loop: In this loop, a developer examines the current product and steers the coding agent to improve it. Last year, a lot of developers (including me) were acting as the QA (quality assurance) function for our coding agents, manually finding bugs and then asking the agent to fix them. But with coding agents much more able to test their own code, the amount of time we need to spend on this function has decreased significantly. This allows us to make higher-level product decisions, such as what key features to offer, where the UI needs improvement, and so on.
The developer-feedback loop operates over time intervals between tens of minutes and hours — that's how frequently a developer might review a product and give feedback. In the case of the typing app, I changed my mind a few times about the visual design, what cat costumes she can unlock as she learns (she loves cats), and the user flow for a grown-up to log in and steer the child's learning experience.
When a developer has a clear vision for what to build, it is still a lot of work to translate that vision into a specification for a coding agent to implement. Further, after the developer has seen an implementation, they might update (or perhaps clarify) the spec to steer it toward what they want. If you find that the system repeatedly runs into certain problems, building a set of evals for the agent becomes useful.
AI-native teams are increasingly using AI to help shape product direction, for example, automating the gathering and analysis of usage data, summarizing written and verbal customer feedback, or carrying out competitive analysis. However, for pretty much all the products I’m involved in, I see humans as having a significant context advantage over current AI systems — we know a lot more than the AI system about the users and the context the product has to operate in — and thus humans play a critical role. Many people describe this human contribution as “taste,” but I prefer to think of it as humans having a context advantage, since that gives us a clearer path to helping AI systems get better. This also speaks to why this step can’t be automated: So long as the human knows something the AI does not, human-in-the-loop is needed to to inject that knowledge into the system.
External feedback loop: This includes a wide range of tactics like asking a few friends for feedback, launching to alpha testers, or putting the code into production with A/B testing. These tactics are usually slow, rarely taking less than hours and sometimes taking days or even weeks. This data informs the developer vision, which in turn continues to drive the detailed product spec, which in turn drives the coding agent.
With coding agents speeding up software development, more engineers are starting to play a partial product management role. For many engineers who are growing into this role, the hardest part is shaping the product vision and striking a balance between building (bridging the gap between vision and spec) and getting user feedback to evolve the vision. It is important to do both!
I will write more about how to do this in future posts, but for now, I find it encouraging that engineers are playing an expanded role (just as product managers and designers now do more engineering).
[Original text: The Batch]
AI won't invent, but it will de-invent management of bureaucracy and re-invent management for AI agents. 14 years ago I called this the upcoming "shepherd" role, a human directing flocks of AI agents toward more fertile fields of research, business efficiency, etc.
Listening to Postecoglu suggesting that all Ancelotti did tonight was put on his players and wait for them to do their thing, I’m reminded of the words attributed to Lao Tzu 1700 years ago:
“The best of all rulers is but a shadowy presence to his subjects. Hesitant, he…
My entire AI stack is now Chinese 🇨🇳
87% cheaper. same revenue
swaps by task:
1. reasoning / backend brain
Opus 4.8 → Kimi K2.7
benchmark gap: ~8% · price: ~11x cheaper
2. code generation
GPT-5.5 → Qwen 3.7 Max
benchmark gap: ~18% · price: ~7x cheaper
3. agent loops + tool calling
Sonnet 4.7 → GLM 5.2
benchmark gap: ~3% · price: ~5x cheaper on input
4. cheap volume / bulk processing
GPT-5.5 mini → MiMo V2.5
benchmark gap: ~6% · price: ~12x cheaper
5. image generation
GPT-Image-2 → Wan 2.5
benchmark gap: ~5% · price: ~8x cheaper
6. video generation
Sora 2 → Kling 3.0
benchmark gap: roughly equal · price: ~6x cheaper
[ result after 30 days: ]
operating costs dropped 87%, output quality dropped 4% on average, revenue unchanged
the most important that these models will be not banned in a month and i can run them locally
nobody will steal my data and i can learn them as i need
full article drops tomorrow with:
> exact routing logic per task type
> the 2 cases where I still pay for American
> the migration playbook anyone can copy in a weekend
VERY IMPORTANT to get migrated now, while it's not too late
If you’ve actually solved real problems, package your experience and spin up a coaching/consulting product.
This might be something simple, like 6 months or 6 sessions, with a fixed outcome and a fixed fee (e.g., $10k).
Or offer a non-executive director seat – a formal board role for strategy (not day-to-day ops).
Think 1-3 days/month, with a retainer (maybe equity) and a clear scope.
Write the promise, set the fee and publish your offer.
My friend Jodie Cook is running a free live workshop on packaging your expertise into authority that attracts perfect-fit clients. Last date, don't miss it: https://t.co/fsi9JCnVar
10 repositorios de GitHub para scrapear todo internet
Guárdalos todos. Cada uno extrae datos limpios de cualquier web. Ese nivel de acceso normalmente exige llamadas de ventas y contratos.
Erling Haaland’s simple secrets to peak performance:
Sleep is the most important thing. He wears blue-blocking glasses 3 hours before bed, shuts out all signals, and avoids sleep trackers because they make you overthink.
“Small things every single day for a longer period really pays off.”
What’s your take, do you think simple habits like blue light blocking and consistent sleep are the real keys to high performance?
Working in finance is realizing that the job is often an escape from real life for a lot of people in the industry
The MDs and VPs with children who are staying in the office past 10 PM instead of spending time with their kids
The PE and HF portfolio managers who continue to stick around despite having a net worth that should have made them leave a decade ago
Some of them are there because they truly love the grind. Others because they are deeply unhappy with their personal lives and view the job as the only avenue where they have full control over the outcomes
The irony is that the unhappiness stems from all the sacrifices they made for this job in the first place, whether that is in their relationships or friendships
But the stockholm syndrome is real, and their careers are so core to their personality that it is impossible to leave behind
sabaha kadar beslenme yok proetin yok mikro bilmem ne anlatın.
önemli olanın idman şiddeti olduğuna dair en büyük kanıt. sizce bu adam kalori sayıyor mudur ? bu adam günde en az 3 tane ekmek yiyordur 2 litre de standart kola içiyordur.
sporun kendisi şiddetli yapılırsa ne yediğinizin çok da bir önemi yok. yıllardır spor yapan ve kendince vücut yapmış biri olarak söylüyorum bunu.
sağlam basın , gerisi önemsiz.
- DeepSeek V4 Flash - Native Precision (FP4 + FP8)
- Fits on 2x RTX Pro 6000 GPUs + 256 GB DDR5 RAM
- Using KTransformers: KVCache-AI fork of SGLang for GPU/CPU memory inference
I have a somewhat obsession running applications on resource constrained systems to squeeze the maximum performance possible. Part of that comes from a past life working as a systems engineer, building & upgrading nationwide (USA) Video-On-Demand streaming backends, while navigating headless *nix servers around the time "cloud" was becoming a buzzword.
KTransformers gets less mention across the LLM inference-sphere despite being among the engines listed for many of the popular models on HuggingFace (alongside vLLM, SGLang, & llama.cpp). The KVCache-AI team is best known for providing a forked SGLang for hybrid GPU / CPU memory inference, benefitting MoE models. I expect these hybrid setups to gain in popularity, especially on the consumer side as hardware prices continue soaring.
"Necessity is the mother of invention" as they say, and local AI runners will continue finding more creative ways to run intelligence, whether that involves GPU/CPU memory offload, distributed training / inference, model weight / KV Cache quants, or REAPs.
Here I have DeepSeek V4 Flash running at a 1M context length on 2x RTX Pro 6000s GPUs, using its native mixed precision of FP4 + FP8. KTransformers allows you to reduce your GPU utilization by offloading experts per MoE layer onto GPU VRAM, with the remaining balanced across system RAM. KTransformers also has the ability to update GPU expert placement during inference from routing statistics collected during the prefill phase. There's also a lot of trial and error involved given the limited amount of kernel support for RTX Pro 6000s.
Two of the prompt load stress-test benchmarks I like to run are from the local-inference-lab/llm-inference-bench Github repo & AlienKevin/SWE-ZERO-12M-trajectories HuggingFace dataset.
Here are the main KTransformers SGLang optimized flags:
- Context Length: 1048576
- Total Number of Tokens: 1048576
- Chunked Prefill Size: 16384
- Max Prefill Tokens: 16384
- GPU Prefill Token Threshold: 1024
- GPU Memory Utilization: 87%
- Number of Experts per MoE Layer on GPU: 134 / 256
- Max Running Requests: 256
- CUDA Graph Max Batch Size: 256
- CUDA Graph Batch Sizes: 1 2 4 8 16 32 64 128 256
- Available GPU Memory: 20.81GB (anything less was too tight for agentic coding)
Below are the AlienKevin/SWE-ZERO-12M-trajectories benchmark results for 100 prompts with 10 concurrent, ~8k input tokens, & ~1k output tokens. Both Radix & Chunked Prefix Cache were disabled for the absolute worst-case scenario:
- Prefill Mean Batch Tokens: 35756.93 tok/sec
- Prefill Median Batch Tokens: 652.90 tok/sec
- TTFT Mean: 20.698s
- TTFT Median: 12.714s
- Decode Mean Batch Output Tokens: 27.39 tok/sec
- Decode Median Batch Output Tokens: 20.63 tok/sec
- Utilized CPU memory: ~200 GB
A more detailed write-up will follow, which'll include the methodology of calculating the number of experts per MoE layer on GPU, maximum number of tokens, and GPU memory utilization for a healthy balance for running tool calls & benchmarks in this hybrid setup.
Hopefully this'll be reproducible for you and on alternative GPUs, as well as current & future models. Let me know how it works for you! My future plans involve GPU/CPU memory inference tests for MiniMax M3, GLM-5.2, and Kimi K2.7-Code.
All links for all of the resources getting DeepSeek V4 Flash native mixed precision on 2x RTX Pro 6000 GPUs + 256 GB RAM can be found in the follow up post.