🧵 Time for a short end-of-2025 wrap up 🧵
Genuinely aiming for this to be a short one for two reasons:
1. I'm doing it at the last moment 😅
2. Most of what I was involved in is not stuff that can be shared publicly (yet… or ever?). Or maybe I was just lazy...
Let's go [1/12]
OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: our autonomous research agent, Aiden.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers: 🧵 (1/8)
OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: our autonomous research agent, Aiden.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers: 🧵 (1/8)
@thingsshldwrk@KlepperCasey They were profit making until the toxic assets that are X and https://t.co/2g8tfIKbmg got bundled up into it. Now it's posting massive losses.
SpaceX being rammed into indices with no profit requirements, seasoning, and generally looser constraints is economic terrorism. Index trackers will eat the loss when reality catches up and retail investors will suffer.
MLE-Bench scores have jumped from 30% to 80% over the last two years.
But how much of that is real algorithmic progress vs. better base models + problem definition shifts + overfitting?
Turns out: not much. Once you control for the same step budget and models, and then test on a different set of tasks, the two-year-old AIDE algorithm matches modern agent/evolutionary search systems.
Figure from FML-Bench, a new automated ML research benchmark, which unifies the code editing agent, step definition, and val/test split, and tries to benchmark the algorithmic efficiency (search/memory) of the agents.
paper link: https://t.co/8QllTan4cX
Shrey spelling 32 words in 90 seconds to win the Spelling Bee is the new greatest athletic accomplishment of 2026. I don’t even know how he said the letters that fast. Got a “Holy Mackerel” out of
@minakimes
The scientific method has been an extraordinary engine of progress. It’s also barely changed in 400 years. Inherent’s bet is that science is on the brink of a second revolution, one built around what humans and self-improving AI can do together. Read more about why we co-led their $50m seed round: https://t.co/dQRl5xajYW
Proud to announce the launch of @inherent_labs. We’re reinventing the scientific research factory for the age of AI agents.
I’m joined by co-founders @kallyaleksiev, @LouisKirschAI and @TantumSCollins; all are deeply technical operators.
Time to live within the experiment.
We’re excited to introduce Inherent, a lab designed from scratch to build AI agents that discover new knowledge.
The coming era of machine-driven scientific inquiry demands a new kind of research institution and a new kind of AI.
To achieve our mission, we live within the experiment, recursively self-improving the entire research organisation. We investigate questions including:
- What does ‘AI taste’ look like in the sciences, and how can we build an institution that embraces this new aesthetic of discovery?
- What new kinds of human-machine teaming will make the most of AI that can truly innovate?
- How can we build recursive self-improvement at the collective level that continually increases human agency over outcomes?
We have just closed a $50m seed round led by @IndexVentures and @radicalvcfund, with participation from other outstanding investors including NVentures (@nvidia's venture capital arm), @buildexante, Metaplanet, Macroscopic, @MythosVentures, Charlie Songhurst, @chalfs, @jluan, @dwarkesh_sp, @Thom_Wolf, @j_foerst and @maxjaderberg. We are advised by @matthewclifford.
Inherent is a Public Benefit Corporation headquartered in London.
International consensus in tech is rare, and I can't believe we're achieving it today by agreeing that the Luce is probably what Jony Ive pitched for the Apple Car, and got a Ferrari badge slapped on after Apple passed.
There will be 3 kinds of scientists in the coming years:
1. The Blenderists, who cover their eyes to ignore the impact of AI.
2. AI scalers like OP(?), who think everything can be solved by making GPUs go brrr.
3. Actual researchers who embrace the tech and explore new frontiers.
academics are unprepared for the coming world where much scientific progress is majorly a function of inference compute. whether OpenAI points the Eye of Stargate at your particular field will decide its acceleration. talent will leach away into the labs. it's already begun
@ToddVercoe@_colourmeamused Also used as a noun in the UK in this very specific case: https://t.co/B8cjgL6Je6. Admittedly no relation to the Dutch word.
The SpaceX IPO is the most brazen retail fleecing in modern market history.
NASDAQ has REWRITTEN the index rules specifically for this listing. The 10% minimum free float requirement: gone. The 3 to 12 month seasoning period before index inclusion: cut to 15 trading days. Companies with small floats can now be weighted at 3x their actual float.
Translation: every passive index fund, every 401k, every pension is about to be force-fed SPCX whether they want it or not.
And what exactly are they buying?
Class A shares carrying ONE vote each, while Musk holds 93.6% of the Class B super voting shares at TEN votes each. That gives him 85.1% of voting power on a 42% economic interest. He cannot be outvoted. He cannot be removed. CEO, CTO and board chairman simultaneously.
For reference: Zuckerberg controls 61% of Meta. Buffett 35% of Berkshire. Musk: 85.1%.
SpaceX is also claiming "controlled company" status, exempting it from needing a majority of independent directors. Shareholders waive the right to a jury trial. They waive the right to class actions. Mandatory arbitration only, courtesy of an SEC rule change pushed through on a party line vote last September.
$1.75 trillion valuation. $80 billion raise. Largest IPO in history.
The rules of the game were quietly rewritten so one man could extract maximum capital from retail while answering to no one.
@willdepue Yeah I didn't want to assume you held that view (hence the (?) qualifier). But there are a lot of people who fall into the second category unironically. Thanks for providing additional context!
@HrrsMjd GPUs going brrrr alone isn't going to make me this delicious cocktail I'm enjoying without further work in robotics (mixing the drink), understanding my tastes (personalisation) and that I want one (proactivity), etc. GPUs will brrr as part of it, but there are other parts.