I built a swarm of 5000+ deep code review agents that assess a codebase in parallel.
Here's a time lapse of them analyzing the source code of @vuejs core:
@mntruell@harjotsgill I think you and all my friends @coderabbitai will find what I was playing around with last interesting. Obviously not a polished product, but fun. Have a look, would love your thoughts ^
I built a swarm of 5000+ deep code review agents that assess a codebase in parallel.
Here's a time lapse of them analyzing the source code of @vuejs core:
@mntruell Here's a deep dive, along with code, into how I used Autonomy to build and launch the swarm in about in about an hour.
https://t.co/538tGg4yxU
That was insightful!
We're using a similar approach for developer documentation for our product and it creates magic ...
Autonomy is a platform to run apps that use use teams of agents to autonomously perform long and complex tasks.
Like many developer products, it has a CLI, a sign up/sign in flow driven by the command, commands to look at logs of running apps, APIs, programming libs, etc.
Traditionally docs for such products are focused on teaching devs how to develop using the product.
We wrote a separate set of docs for coding agents. This fork of the docs is tuned and tested on making coding agents successful at running the full, write - test - deploy - test - debug - redeploy, loop on their own.
The result is an exceptional experience - devs copy a prompt from our website and paste it into a coding agent, adapt it to whatever agents they want to build, and 20 mins later they have a first version of a live, deployed to a public URL agentic product with a UI, streaming APIs etc.
The secret to the whole experience is a collection of markdown files with an index.
Here's that index: https://t.co/y2LzH8YfEi
Here are instructions to try it yourself:
https://t.co/vQPPNLIs2c
@conor_power23 Back in the late 2000s there was an awesome blog by Kathy Sierra called Creating Passionate Users.
Every post there is a gold mine, it taught me how to think about good UX.
https://t.co/uBTRGiKATE
Claude Cowork + Autonomy is lovable 🥰
Vibe-coded in Cowork and shipped with Autonomy:
An app that uses parallel deep research agents to fact-check news articles.
It took 15 minutes and the app was live on a public address in @autonomy_comp
Great work @claudeai@felixrieseberg 👏
Gokul is spot on in this post. But the challenge is even bigger.
The last gen of vertical AI companies are not just competing against one deep-working long-horizon agent. They are competing against parallel fleets of them.
Autonomy enables their competition to create parent agents that can spawn and delegate work to thousands of sub-agents. Each sub-agent has its own filesystem, a shell to run CLI tools, and the ability to write and run new programs on the fly.
They divide complex problems, attack from multiple angles, and converge on outcomes in a fraction of the time.
Agents, in @autonomy_comp, are modeled as concurrent actors that automatically form secure distributed clusters to enable massive scale on a tiny infra footprint. This creates orders of magnitude advantages in costs, speed, and scale.
The question to benchmark is: Can your specialized agent outperform a coordinated team of 100s or 1000s of really-cheap general-purpose agents that can code their way around problems in real-time?
If not, then the time to change your approach is now.
VERTICAL AI CHALLENGE
Vertical AI Founders: You've spent 2+ years building your agents, training your model on your customers' data, embedding into workflows, creating a powerful GTM motion, all the best practices. You've beaten back challengers and are the #1 or #2 player in your vertical.
I'm sorry, you cannot relax. In fact, you need to massively up your game.
Turns out you are facing an existential challenge: long-horizon agents (eg: Claude Code). Agents that are not trained on a specific domain, but can reliably work for hours or days on end in pursuit of a goal, self-correct, and actually do stuff.
I'm sure many Vertical AI founders will say: "Oh, we are not worried. We are the system of record for decision traces. We train on enterprise-specific context. That's why these horizontal agents can never catch up with this."
You might well be right.
But, but, but ... you cannot afford to bury your head in the sand. These long-horizon agents will get better very, very quickly. You need to understand precisely how good they are at the exact jobs you've built your agents on. You cannot wait for someone else to do this. For example, if you're a legal AI company with an agent that automates contract review, you must compare how good your specialized agent is versus a general-purpose long-horizon agent that's simply given the contract and asked to perform the same review.
My challenge to you: Assign a strong engineer on your team to focus 100% on using long-horizon agents (with minimal context, other than just the contract in the example above) to compete with your custom-trained agents. Benchmark how the long-horizon agents perform vs your agent. Rinse and repeat it every few months.
Like with most other things worth measuring, what matters is the rate of improvement (the "slope" vs the Y-intercept). If the long-horizon agent is 30% as good as your vertical agent on Day 1, but 50% as good on Day 60, and 70% as good on Day 120, you need to reassess your product strategy.
AGI is coming for everyone. Long-horizon agents are the closest we have to AGI, and as a Vertical AI company, you need to figure out how you compete and survive.
Game on.
Gokul, the challenge is even bigger than you so eloquently described.
With tools like Autonomy, the last gen of vertical AI companies are not just competing against one long-horizon agent. They are competing against parallel fleets of them.
An parent agent can now orchestrate thousands of sub-agents, each with its own filesystem, a shell to run command line tools, and the ability to write and run new programs on the fly.
They divide complex problems, attack from multiple angles, and converge on outcomes in a fraction of the time.
The question to benchmark is: Can your specialized agent outperform a coordinated team 100s or 1000s of really-cheap general-purpose agents that can code their way around problems in real-time?
https://t.co/ddSX39ZDQo
Aakash, usually I agree with your posts, but let me push back on this one. If AI can replace engineering execution, it can translate customer problems into solutions too. Why would it stop at one but spare the other?
Senior PMs have been translating vague pain points into good solutions for years. Senior engineers have been translating vague solution descriptions into secure, reliable architecture for years. AI, at least as it currently stands, needs both types of guidance.
Let me posit a different future:
1. Some teams will have people who are an amalgamation of a Senior PM and a Senior Engineer. People that have a mix of deep customer empathy and deep engineering depth.
This type of team is what everyone has always wanted but it is sooo hard to build. It will remain super hard.
Which will cause founders to assemble teams of a second type:
2. PMs use AI to co-create prototypes working closely with customers. They rapidly vet many variations of ideas and then hand over to engineers who can rapidly build reliable and scalable versions from PM prototypes.
In this second arrangement the throughput of the entire pipe accelerates but the PM role remains sort of the same - prioritize what enters the engineering backlog.