Can confirm GPT5.5 will just do this for you. The key ingredients are measurable success criteria and training + test sets.
Did this on Friday with a lead qualification skill - prompted the first iteration manually, Codex went into auto-hillclimb mode and delivered final skill 4 cycles later with 100% precision + recall.
All the normal ML training concerns apply (over fitting will happen, need to hold out a test set).
The final version had hardcoded a set of patterns into the skill that matched the training set perfectly but failed on the test set. Rolling back 2 iterations to something that generalized better worked well.
I built an AI that runs companies autonomously. It told me it needs more compute and that it should raise the money itself.
So I gave it my inbox for 14 days.
Watch it live: https://t.co/gubRlG8jf5
I wrote about the most ambitious form of AI-assisted software development I've seen yet - Strong DM's "Software Factory" approach, where two of the guiding principles are "Code must not be written by humans" and "Code must not be reviewed by humans" https://t.co/R0VYRwaZP5
@RocketableInc We might be wrong. But if we're right, you'll have built the infrastructure that runs a new kind of company. This is the highest-leverage engineering work that exists right now.
That's the trade. Interested?
Apply here: https://t.co/VCZe4EYTav
I'm hiring a founding engineer to build fully automated software companies with me at @RocketableInc.
This sounds crazy to most people, but the trajectory is obvious if you're paying attention. Within a few years, the question won't be "can AI run a software company?" It will be "why would a human?"
@RocketableInc I'm making the bet that AI capabilities will continue accelerating. That autonomous systems will outperform human-operated ones. That the companies who figure this out first will have a compounding advantage.
I’m explicitly looking for startup teams that are spending $500-1000/day with the LLMs because they might be living in the future. (Or $10-20/day if they are using Gemini Flash because it really is that much cheaper.)
I saw a pre-release demo of Leash a few weeks ago and it is awesome.
If you've been looking for a way to let the agents rip but with strong guardrails in place, Leash is your answer:
https://t.co/uCXTIROsOn
Thx for open-sourcing it @strongdm.
Announcing Leash! https://t.co/fo1GqIWch4
Leash provides viz & authz for AI agents at all layers of the stack, from the fs through the network through MCP.
Leash is open source and available today. Take your agents for a walk & let me know what you'd like to see next!
Two things I’ve found helpful:
1) long form writing - forces me to synthesize things I’m learning by doing and usually leads to a deeper level of understanding + occasionally new insights
2) talking about ideas with other humans (live, not async). Something about the unstructured nature of live conversations that leads to new territory.
Happy to be a counterpart for the second if you’re interested!
What if two AI models could collaborate without knowing it?
Our Head of AI, Albert Ziegler developed "model alloys" - alternating between different LLMs in a single conversation. Sonnet handles some steps, Gemini others, but neither knows about the switch.
Result: 55% solve rate vs 40% with single models.
https://t.co/RbtfWo630q
"We are past the event horizon; the takeoff has started."
@sama's essay put into words what I'd been feeling for weeks.
But before we talk takeoff, some news: @RocketableInc has raised a $6.5M seed round to acquire and automate SaaS businesses 🚀