Sourabh Mathur

@sourabhm

Joined June 2009

370 Following

40 Followers

284 Posts

sourabhm retweeted

Andrew Ng

@AndrewYNg

2 days ago

“Loop engineering” is a hot buzzphrase after mentions of it by Boris Cherny (Claude Code’s creator) and Peter Steinberger (OpenClaw's creator) went viral on social media. Loops are now a key part of how we get AI agents to iterate at length to build software. In this letter, I’d like to share my 3 key loops, shown in the image below, for building 0-to-1 products. These loops guide not just how I build software, but also how I decide what software to build. Agentic coding loop: Given a product specification and optionally a set of evals (that is, a dataset against which to measure performance), we can have an AI agent write code, test its work, and keep iterating until the code is bug-free and meets its specification. This idea of closing the loop took off around the end of last year, and it has been a game changer in enabling coding agents to work longer productively without human intervention. For example, over the weekend, I was building an app for my daughter to practice typing, and my coding agent could easily work for around an hour, using a web browser to check what it had built multiple times before getting back to me, without needing my intervention. The engineering loop executes quickly. Every few minutes, the coding agent might build and test a new version of the software. I hear frequently from developers who are finding new ways to engineer more effective engineering loops. This is an active area of invention! Developer feedback loop: In this loop, a developer examines the current product and steers the coding agent to improve it. Last year, a lot of developers (including me) were acting as the QA (quality assurance) function for our coding agents, manually finding bugs and then asking the agent to fix them. But with coding agents much more able to test their own code, the amount of time we need to spend on this function has decreased significantly. This allows us to make higher-level product decisions, such as what key features to offer, where the UI needs improvement, and so on. The developer-feedback loop operates over time intervals between tens of minutes and hours — that's how frequently a developer might review a product and give feedback. In the case of the typing app, I changed my mind a few times about the visual design, what cat costumes she can unlock as she learns (she loves cats), and the user flow for a grown-up to log in and steer the child's learning experience. When a developer has a clear vision for what to build, it is still a lot of work to translate that vision into a specification for a coding agent to implement. Further, after the developer has seen an implementation, they might update (or perhaps clarify) the spec to steer it toward what they want. If you find that the system repeatedly runs into certain problems, building a set of evals for the agent becomes useful. AI-native teams are increasingly using AI to help shape product direction, for example, automating the gathering and analysis of usage data, summarizing written and verbal customer feedback, or carrying out competitive analysis. However, for pretty much all the products I’m involved in, I see humans as having a significant context advantage over current AI systems — we know a lot more than the AI system about the users and the context the product has to operate in — and thus humans play a critical role. Many people describe this human contribution as “taste,” but I prefer to think of it as humans having a context advantage, since that gives us a clearer path to helping AI systems get better. This also speaks to why this step can’t be automated: So long as the human knows something the AI does not, human-in-the-loop is needed to to inject that knowledge into the system. External feedback loop: This includes a wide range of tactics like asking a few friends for feedback, launching to alpha testers, or putting the code into production with A/B testing. These tactics are usually slow, rarely taking less than hours and sometimes taking days or even weeks. This data informs the developer vision, which in turn continues to drive the detailed product spec, which in turn drives the coding agent. With coding agents speeding up software development, more engineers are starting to play a partial product management role. For many engineers who are growing into this role, the hardest part is shaping the product vision and striking a balance between building (bridging the gap between vision and spec) and getting user feedback to evolve the vision. It is important to do both! I will write more about how to do this in future posts, but for now, I find it encouraging that engineers are playing an expanded role (just as product managers and designers now do more engineering). [Original text: The Batch]

AndrewYNg's tweet photo. “Loop engineering” is a hot buzzphrase after mentions of it by Boris Cherny (Claude Code’s creator) and Peter Steinberger (OpenClaw's creator) went viral on social media. Loops are now a key part of how we get AI agents to iterate at length to build software. In this letter, I’d like to share my 3 key loops, shown in the image below, for building 0-to-1 products. These loops guide not just how I build software, but also how I decide what software to build.

Agentic coding loop: Given a product specification and optionally a set of evals (that is, a dataset against which to measure performance), we can have an AI agent write code, test its work, and keep iterating until the code is bug-free and meets its specification. This idea of closing the loop took off around the end of last year, and it has been a game changer in enabling coding agents to work longer productively without human intervention. For example, over the weekend, I was building an app for my daughter to practice typing, and my coding agent could easily work for around an hour, using a web browser to check what it had built multiple times before getting back to me, without needing my intervention.

The engineering loop executes quickly. Every few minutes, the coding agent might build and test a new version of the software. I hear frequently from developers who are finding new ways to engineer more effective engineering loops. This is an active area of invention!

Developer feedback loop: In this loop, a developer examines the current product and steers the coding agent to improve it. Last year, a lot of developers (including me) were acting as the QA (quality assurance) function for our coding agents, manually finding bugs and then asking the agent to fix them. But with coding agents much more able to test their own code, the amount of time we need to spend on this function has decreased significantly. This allows us to make higher-level product decisions, such as what key features to offer, where the UI needs improvement, and so on.

The developer-feedback loop operates over time intervals between tens of minutes and hours — that's how frequently a developer might review a product and give feedback. In the case of the typing app, I changed my mind a few times about the visual design, what cat costumes she can unlock as she learns (she loves cats), and the user flow for a grown-up to log in and steer the child's learning experience.

When a developer has a clear vision for what to build, it is still a lot of work to translate that vision into a specification for a coding agent to implement. Further, after the developer has seen an implementation, they might update (or perhaps clarify) the spec to steer it toward what they want. If you find that the system repeatedly runs into certain problems, building a set of evals for the agent becomes useful.

AI-native teams are increasingly using AI to help shape product direction, for example, automating the gathering and analysis of usage data, summarizing written and verbal customer feedback, or carrying out competitive analysis. However, for pretty much all the products I’m involved in, I see humans as having a significant context advantage over current AI systems — we know a lot more than the AI system about the users and the context the product has to operate in — and thus humans play a critical role. Many people describe this human contribution as “taste,” but I prefer to think of it as humans having a context advantage, since that gives us a clearer path to helping AI systems get better. This also speaks to why this step can’t be automated: So long as the human knows something the AI does not, human-in-the-loop is needed to to inject that knowledge into the system.

External feedback loop: This includes a wide range of tactics like asking a few friends for feedback, launching to alpha testers, or putting the code into production with A/B testing. These tactics are usually slow, rarely taking less than hours and sometimes taking days or even weeks. This data informs the developer vision, which in turn continues to drive the detailed product spec, which in turn drives the coding agent.

With coding agents speeding up software development, more engineers are starting to play a partial product management role. For many engineers who are growing into this role, the hardest part is shaping the product vision and striking a balance between building (bridging the gap between vision and spec) and getting user feedback to evolve the vision. It is important to do both!

I will write more about how to do this in future posts, but for now, I find it encouraging that engineers are playing an expanded role (just as product managers and designers now do more engineering).

[Original text: The Batch]

319

10K

535K

sourabhm retweeted

Aakash Gupta

@aakashgupta

6 days ago

The "I use AI in my workflow" answer is failing PM candidates right now. Not because it's wrong. Because everyone says it. The interviewers who are actually hiring for AI roles have moved past asking the question entirely. One founder I follow runs every interview the same way. She asks candidates to screen share. No prompts, no setup. Just: show me how you AI. What she finds is almost always the same. The candidate who spent the last six months talking about being "AI-pilled" opens ChatGPT and starts typing a question. Maybe they paste in a doc and ask for a summary. She calls this Level 1. Chat mode. Search mode. You ask, it answers. Here's the framework she uses to sort candidates in real time: Level 1: You talk to AI. You ask questions. You get answers. You're basically using a better search engine. Level 2: You automate a workflow. One piece of your job runs on a repeatable system you built. Not a prompt, a pipeline. Level 3: You build apps. Something was tedious enough that you made a tool to handle it. You're shipping internally. Level 4: You ship to customers. The AI you built isn't just for you. It's in production. Most candidates are at Level 1. They think they're at Level 3. The screen share takes about four minutes to reveal the gap. The insight worth sitting with: this is the same thing that happened with data literacy five years ago. "I'm comfortable with data" meant nothing by 2021. Hiring managers wanted to see the SQL, the dashboard, the decision you made because of the analysis. The vague claim became worthless. "I use AI" is at that same inflection point now. The claim is worthless. The screen is everything. The candidates getting through aren't the ones with the most impressive AI vocabulary. They're the ones who automated something real, built something that worked, and can open their laptop and show it in 60 seconds. Build something you can show.

107

21K

sourabhm retweeted

Mark Ajzenstadt

@mardehaym

5 days ago

Matt Pocock just dropped a free 2-hour workshop on the exact workflow he uses to ship code with AI agents. This is the most practical breakdown of AI-assisted development you'll find anywhere. People are paying $500 for courses that teach less than this. Watch it, then read the step by step guide on AI coding workflows below.

120

192

22K

sourabhm retweeted

Mark Ajzenstadt

@mardehaym

6 days ago

https://t.co/hKZOHolXBU

179

146K

Who to follow

₿itcoin ⚡

@Bitcoin05460599

#Bitcoin #digitalGold #digitalProperty #digitalRealEstate #Lightning⚡⚡⚡

Alex 🆙

@gratefulchain

Family Man. AI operator, not a theorist. Helping founders and businesses profit from AI.

Vikash Sharma 🇮🇳

@VikashSharmaIB

Information Technology Leader, Investment banker, Proud Indian, Always speak my mind, Whatever is in favour of Nation, is always right for me!!

sourabhm retweeted

Addy Osmani

@addyosmani

24 days ago

https://t.co/hIe0UX7z6T

356

19K

sourabhm retweeted

Vaishnavi

@_vmlops

24 days ago

MICROSOFT OPEN-SOURCED THE COMPLETE MCP PLAYBOOK Every ai engineer needs to understand model context protocol in 2026 Microsoft built a full open-source course so you don't have to figure it out the hard way here's what's inside: ▫️ 11 modules from zero to production ▫️ hands-on labs in python, typescript, java, rust, c# and javascript ▫️ covers mcp servers, clients, security, oauth2, azure integration ▫️ a 13-lab capstone with real postgresql + vector search ▫️ works with claude desktop, cursor, vs code and more ▫️ 16k stars. 5.3k forks. built by microsoft. the curriculum even teaches adversarial multi-agent reasoning two agents debate using shared mcp tools, judged by a third agent. if you're building with ai agents in 2026, mcp is the layer that connects everything. this is the fastest way in https://t.co/jZlCZE8VkX

601

121

784

31K

sourabhm retweeted

Anatoli Kopadze

@AnatoliKopadze

about 1 month ago

https://t.co/AAWIZD1pNL

476

15K

sourabhm retweeted

Uncle Bob Martin

@unclebobmartin

about 1 month ago

I start with very informal specifications written by hand. I have an agent convert these into harder specifications that are subdivided into tasks. I review these. Then I feed those tasks into the specifier agent, which converts each task to Gherkin, prunes the Gherkin, and then hands it off to the coder agent. I spot check the Gherkin. The coder agent writes acceptance tests directly from the Gherkin. Then writes unit tests. Then writes code. When all those tests pass, the coder agents hands off to the refactorer agent. The refactorer agent reduces crap to 6 or below, and reduces any duplication. Then it write property tests and gets them to pass. Then it hands off to the architect agent. The architect agent runs language mutation and covers any uncovered sections, and kills all survivors. Then it runs Gherkin mutation and kills any of those survivors. Then it runs the entire test suite, and when it passes it hands the result off to the specifier, coder, and refactorer. I spot check the code. This is an exercise of transformations from the informal to the formal through managed stages, with human interaction decreasing with each stage. Raw computer power is the limiting factor. Those mutation tests are CPU intensive.

748

999

56K

Sourabh Mathur @sourabhm

about 1 month ago

This #MemorialDay we remember the service members whose sacrifices lets us enjoy the freedoms we cherish 🙏.

sourabhm retweeted

Greg Brockman

@gdb

about 1 month ago

self improvement prompt for codex

113

346

496K

sourabhm retweeted

Kirill

@kirillk_web3

about 1 month ago

> be Kimi Founder > 32 years old > peers are choosing corporate jobs > you're building AI infrastructure from scratch > China. no English press. no Western hype. > raise $2B. hit $20B valuation. quietly. > nobody outside China writes your name > invent a new optimizer. 2x more efficient than the industry standard. > build 300-agent parallel systems. 4,000 steps. 12 hours straight. > open source. free. beats the models everyone pays $200/month for. > one afternoon you sit down > record 40 minutes > give the entire playbook away for free > the math. the architecture. the decisions. > a week later Western developers find it > "wait. this exists?" > "wait. it's free?" > "wait. it beats Claude?" > you were already on the next version > they were just catching up > different game.

189

526K

sourabhm retweeted

Shayne Boyer

@spboyer

about 1 month ago

LLM-as-judge gets noisy when one answer wins just by going first. Waza uses swap-based pairwise judging: score A/B, then B/A, to cancel position bias. You get a signal instead of a coin flip. How do you handle position bias in your evals? https://t.co/BeHptuwPIh

sourabhm retweeted

Rahul

@sairahul1

about 1 month ago

Anthropic pays $750,000+ a year for engineers who know how to build LLMs from scratch. Stanford just released the exact lecture that teaches it - 1 hour 44 minutes, free, straight from CS229. Bookmark and watch it this weekend. It'll teach you more about how ChatGPT & Claude actually work than most people at top AI companies learn in their entire careers.

107

26K

sourabhm retweeted

ℏεsam

@Hesamation

3 months ago

bro created an AI job search system for Claude Code that scored 700+ job applications and actually got him a job. AND IT'S NOW OPEN-SOURCE. It scans multiple company career pages, rewrites your CV per job, and even fills application forms. The repo has: > 14 skill modes (evaluate, scan, PDF, ...) > Go terminal dashboard > ATS-optimized PDF generation via Playwright > 45+ companies pre-configured (Anthropic, OpenAI, ElevenLabs, Stripe...) GitHub: https://t.co/PwrYBOAphi

392

28K

58K

sourabhm retweeted

Aakash Gupta

@aakashgupta

3 months ago

The gap between a PM getting AI slop from Claude Code and one getting 10x output is about one hour of file structure. Three folders. > A knowledge folder with static context: who you work with, what each stakeholder cares about, reference material that rarely changes. > A projects folder where every task accumulates research, drafts, and artifacts that load instantly into your next session. > And a people folder that auto-updates from meeting transcripts through Granola's MCP. The people folder is the part that compounds. Build a skill that pulls what each person said in your last meeting, what they pushed back on, what they committed to. Now when you draft a message to your VP of Engineering, Claude Code already knows their communication preferences from 30 real conversations. That's context no prompt can replicate. Carl walked through this system on the episode and the compounding math stuck with me. Day 1, Claude Code knows nothing about your work. Day 30, it knows your stakeholders, your project history, your patterns. Day 90, it's surfacing connections across your work you haven't consciously noticed. Then layer on skills. A standup command that pulls from GitHub, Linear, your calendar, and your task folder in one shot. Website traffic compared against your LinkedIn posts this week. Analyses that would be impossible clicking between individual UIs, running before your first meeting. One hour of setup. Compounding returns every day after. The PMs typing prompts into a blank terminal and the PMs who built the operating system around it are already producing completely different categories of work. Build the operating system.

458

83K

Sourabh Mathur @sourabhm

5 months ago

@jsnover From its humble beginnings as monad to now pwsh 7.x, your work on Powershell has transformed an entire part of software development lifecycle. Thank you.

Sourabh Mathur @sourabhm

about 1 year ago

On this #MemorialDay2025 , thankful for the sacrifices that our men & women in uniform, and their families, have made to preserve our freedoms that we so cherish.

sourabhm retweeted

Carlos E. Perez

@IntuitMachine

about 1 year ago

Shocker! Claude 4 system prompt was leaked, and it's a goldmine! The Claude system prompt incorporates several identifiable agentic AI patterns as described in "A Pattern Language For Agentic AI." Here's an analysis of the key patterns used: Run-Loop Prompting: Claude operates within an execution loop until a clear stopping condition is met, such as answering a user's question or performing a tool action. This is evident in directives like "Claude responds normally and then..." which show turn-based continuation guided by internal conditions. Input Classification & Dispatch: Claude routes queries based on their semantic class—such as support, API queries, emotional support, or safety concerns—ensuring they are handled by different policies or subroutines. This pattern helps manage heterogeneous inputs efficiently. Structured Response Pattern: Claude uses a rigid structure in output formatting—e.g., avoiding lists in casual conversation, using markdown only when specified—which supports clarity, reuse, and system predictability. Declarative Intent: Claude often starts segments with clear intent, such as noting what it can and cannot do, or pre-declaring response constraints. This mitigates ambiguity and guides downstream interpretation. Boundary Signaling: The system prompt distinctly marks different operational contexts—e.g., distinguishing between system limitations, tool usage, and safety constraints. This maintains separation between internal logic and user-facing messaging. Hallucination Mitigation: Many safety and refusal clauses reflect an awareness of LLM failure modes and adopt pattern-based countermeasures—like structured refusals, source-based fallback (e.g., directing users to Anthropic’s site), and explicit response shaping. Protocol-Based Tool Composition: The use of tools like web_search or web_fetch with strict constraints follows this pattern. Claude is trained to use standardized, declarative tool protocols which align with patterns around schema consistency and safe execution. Positional Reinforcement: Critical behaviors (e.g., "Claude must not..." or "Claude should...") are often repeated at both the start and end of instructions, aligning with patterns designed to mitigate behavioral drift in long prompts.

IntuitMachine's tweet photo. Shocker! Claude 4 system prompt was leaked, and it's a goldmine!

The Claude system prompt incorporates several identifiable agentic AI patterns as described in "A Pattern Language For Agentic AI." Here's an analysis of the key patterns used:

Run-Loop Prompting: Claude operates within an execution loop until a clear stopping condition is met, such as answering a user's question or performing a tool action. This is evident in directives like "Claude responds normally and then..." which show turn-based continuation guided by internal conditions.

Input Classification & Dispatch: Claude routes queries based on their semantic class—such as support, API queries, emotional support, or safety concerns—ensuring they are handled by different policies or subroutines. This pattern helps manage heterogeneous inputs efficiently.

Structured Response Pattern: Claude uses a rigid structure in output formatting—e.g., avoiding lists in casual conversation, using markdown only when specified—which supports clarity, reuse, and system predictability.

Declarative Intent: Claude often starts segments with clear intent, such as noting what it can and cannot do, or pre-declaring response constraints. This mitigates ambiguity and guides downstream interpretation.

Boundary Signaling: The system prompt distinctly marks different operational contexts—e.g., distinguishing between system limitations, tool usage, and safety constraints. This maintains separation between internal logic and user-facing messaging.

Hallucination Mitigation: Many safety and refusal clauses reflect an awareness of LLM failure modes and adopt pattern-based countermeasures—like structured refusals, source-based fallback (e.g., directing users to Anthropic’s site), and explicit response shaping.

Protocol-Based Tool Composition: The use of tools like web_search or web_fetch with strict constraints follows this pattern. Claude is trained to use standardized, declarative tool protocols which align with patterns around schema consistency and safe execution.

Positional Reinforcement: Critical behaviors (e.g., "Claude must not..." or "Claude should...") are often repeated at both the start and end of instructions, aligning with patterns designed to mitigate behavioral drift in long prompts.

480

12K

sourabhm retweeted

Nayeem Sheikh

@HeyNayeem

about 1 year ago

I don't understand why so few people use AI tools. Most people only know about ChatGPT. Here are 12 hidden gems you need to know:↓

HeyNayeem's tweet photo. I don't understand why so few people use AI tools.

Most people only know about ChatGPT.

Here are 12 hidden gems you need to know:↓ https://t.co/fLgjTg689o

732

138

455K

sourabhm retweeted

Ironsage

@IronSage_

about 1 year ago

Fasting for 72 hours is the best medicine on Earth. It triggers your body to "eat up" tumors, inflammation, and toxins. It's literally a doctor within. Here's how to fast correctly (according to science):

IronSage_'s tweet photo. Fasting for 72 hours is the best medicine on Earth.

It triggers your body to "eat up" tumors, inflammation, and toxins.

It's literally a doctor within.

Here's how to fast correctly (according to science): https://t.co/1syg6ebNAl

610

42K

48K

10M

Sourabh Mathur

@sourabhm

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users