We shipped our 1000th user story last week with our internal agent orchestration tool. To celebrate, I'm open-sourcing it ๐
For the past 4 months, 100% of our production code has been written with this tool
Remote first, Multi-user, Multi-Harness (Claude, Codex, OpenCode, etc)
@dadadaistt Yes that's our biggest lesson :)
Where should we invest our time now that code is produced in the background ?
Reviewing PR was the bottleneck.
We shifted our focus on the plan, and designed a process that minimizes human intervention between the plan and production.
We shipped our 1000th user story last week with our internal agent orchestration tool. To celebrate, I'm open-sourcing it ๐
For the past 4 months, 100% of our production code has been written with this tool
Remote first, Multi-user, Multi-Harness (Claude, Codex, OpenCode, etc)
@maxfaber_Om 2) Models are still sycophantic :)
I don't mind, the main issue I wanted to solve is what comes after the plan : merging PRs fast (PR matches the plan), and reduce bugs in production (agent must perform manual QA on the PR)
@maxfaber_Om 1) yes. The inital plan is rarely perfect.
Planning phase is super interactive, I treat the agent as a co-worker, we discuss the plan until we come to an agreement.
Once the plan is solid, and I click "start", changes are deployed to production with minimal human intervention.
@maxfaber_Om Haha, that was a fun experiment !
August 2025 was around the time we switched to 100% of the code written by agent.
At the time it was the lot of pain... Sounds like ancient history.
This lead us to build vdaubry/bottega ;)
@maxfaber_Om I checked MaksimZinovev/docfence, this is definitely something that could help with the planning phase !
I spend 50%+ of my time on getting the plan right, this is where most of gains lies.
The PR quality issue is more or less solved for us now (>80% approval after review)
Fork it, break it, build your own version.
If you're building something similar, or you disagree with any of this, I'd love to hear from you.
Repo: https://t.co/cxeVfHq2hf
Writeup: https://t.co/cvlxmvJ0S3
The loop just makes the agent behave like a human dev: plan โ implement + tests โ an adversarial review agent that re-runs the test scenarios and checks no checkbox was silently skipped โ runs QA on the final task via MCP -> open the PR, keep CI green.
The human owns the plan and the review.
Karpathy's autoresearch: give an AI agent a codebase, let it run experiments overnight, wake up to results.
Generalizes to anything โ load testing, landing page A/B tests, infra tuning...
"Program the program" is the real insight, I love it :D
https://t.co/aSL8tivHnj
Dive into the era where language is just a tool! Discover why with AI, even unsexy languages empower solo experts to build big. Curious how? Find out more!
https://t.co/1R39i1SgFH
Unlock AI's true potential by breaking big tasks into bite-sized prompts! Discover why detailed instructions lead to better automation results. Dive in and elevate your AI game!
https://t.co/NY2KAwkiMv
Why avoid writing small bits of code when they can save you hours? Discover how the cost of coding is dropping and boost your productivity!
https://t.co/XjgMTeTOiJ