20+ world-class open-source developers built realistic coding tasks on repos they maintain. They define what “mergeable” means in their repo.
What does it take to measure mergeability? We use a mix of unit tests, rubrics and novel verifiers to assess correctness, test quality, scope discipline, style, and adherence to codebase standards.
My WWDC predictions:
- The iMessage API will open to more AI assistants like Poke
- Certain vibecoding apps will be granted App Store exemptions, with Google to start (likely a carveout between the Siri <> Gemini deal)
@hypersoren@sandeepDshah If you’re looking for a cynical lens to put on, its worth viewing this is as a marketing/branding exercise for new customers and not a way to price gouge our existing customers
AI should earn its keep. Introducing the AI Productivity Guarantee.
If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million.
It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.
I want the following in Codex, Cursor, and OpenCode...
1. Pinned Messages: Let me pin assistant messages to the sidebar for things I want to keep track of but am not ready to address yet. Render as a checklist & jump navigation.
2. Notes: Give me a scratchpad for thoughts while working.
I’m hiring at Vercel on our GTM Eng team btw
Come ship applied AI projects & dogfood Vercel products. Unlimited token budget.
SF, Austin, NYC preferred (open to relocation)
drew at vercel dot com
Send me a project you’ve shipped with AI 🫡
1/ We’ve raised over $1B at a $26B valuation, led by @Lux_Capital, @generalcatalyst, and @8vc.
Our enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492 M.
We launched Devin two years ago as the first AI software engineer. Since then, cloud agents have gone from niche to mainstream, and today they are the fastest growing way to create software.
Something that the US has lost is giant, blue chip companies dreaming up big, absurd moon shot physical projects, sometimes as a marketing play, sometimes to move a marginal amount of product, and occasionally executing on them.