Finally! the first eval ship from cog!!!!!!!!!! 👼🏼
To contextualize: @METR_Evals cap out at ~16 hours.
Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it 🤯
METR dataset: ML eng, GPU kernels, cybersecurity
> "METR (2026) used a combination of GPT-4o and GPT-5 to estimate the human-equivalent times from compressed Claude Code transcripts. These transcripts were collected from 7 METR technical staff on 34 sessions labeled on human ground truth". rlog of 0.83
Cog dataset: real life java/typescript/python/c# feature dev, bugfixes, migrations
> "We collected a ground-truth dataset by asking Devin users to review recent representative sessions, and estimate how long each completed session would have taken without Devin. Our dataset consists of 258 sessions from 126 users across a diverse set of enterprise customers." rlog of 0.74 on held out set
this is pioneering real world evals work and part 1 of a broader frontier code evals drop that I'm really looking forward to writing up. huge kudos to @annarmitchell and @ryanbai1412 for leading the unglamorous last mile data collection!!
Cognition JapanのXアカウントを開始しました。国内展開を加速する私たちの最新アップデートを共有しますので、フォローをお願いします!
We have just started Cognition Japan X account. As we accelerate Japan expansion, we will share our latest updates through this channel!
1/ We’ve raised over $1B at a $26B valuation, led by @Lux_Capital, @generalcatalyst, and @8vc.
Our enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492 M.
We launched Devin two years ago as the first AI software engineer. Since then, cloud agents have gone from niche to mainstream, and today they are the fastest growing way to create software.
Devin can now build and run Android apps.
We added Android Virtual Device (AVD) support for Devin’s machine, which means Devin can now autonomously build, launch, and test Android apps.
Scott Wu is the co-founder of Cognition AI, one of the fastest-growing companies in history. He’s also the greatest competitive programmer the US has ever produced. You may have seen him doing impossible card tricks and mental math.
You’ve never seen him asked about weed, Michael Jordan, cancer, and human consciousness over a punnet of strawberries. That is what Colossus editor-in-chief Jeremy Stern did on a recent visit to San Francisco.
For those less familiar with @ScottWu46: In 2nd grade, he entered a math competition for 7th graders, lost, and was so furious he still fumes about it 20 years later. The next year he entered the 9th-grade division as a 3rd-grader and got a perfect score.
Then he won first place at the US national middle-school math competition and three straight gold medals at the International Olympiad in Informatics, where he became the greatest American gold-medalist and coach in history.
Most of the people running the biggest AI companies met as teenagers, competing for their countries on international math and science teams. OpenAI’s Greg Brockman, Anthropic’s Dario Amodei, Meta’s Alexandr Wang, to name just a few.
Most agree that the von Neumann among them was Scott Wu.
In November 2023, a few weeks after his mother died of lung cancer, on the day Sam Altman was fired from OpenAI, Wu founded his own AI company: Cognition.
He was 26 and saw earlier than almost anyone that AI would converge on agents that work in the background, 24/7, like coworkers. He shipped Cognition’s AI software engineer Devin in March 2024. It worked poorly, and he took intense public criticism for it.
Now, in its first 18 months of service, Devin has generated $445 million of revenue run rate and usage has doubled every eight weeks. The US Army, Goldman Sachs, and Mercedes-Benz are all customers. Cognition is raising at a valuation around $25 billion.
@JeremySternLA sat down with Wu, the emperor of the nerds, to ask the questions we’d all ask one of the smartest people in America—building the most consequential technology of our generation—if we ever got the chance.
As well as MJ and weed, they talk about the cluster of competitive math prodigies behind so much of AI, what makes us human when AGI arrives, and why Wu believes he was put on this earth to teach AI how to code.
Read the piece below.