This is huge— @Cassi_on_X came 2nd overall and 1st on "Dataset" questions on @Research_FRI Forecastbench! 🥈📈
Only @elonmusk's @xai@grok outperformed us overall.
What this means: 🧵
Chief of the Secret Intelligence Service Blaise Metreweli:
"It is with immense sadness that I am sharing the news that former MI6 Chief Sir Alex Younger died today after fighting cancer for some time. Alex embodied my Service's values of integrity, courage, creativity and respect.
“We remember Alex's deep commitment to public service and the security of the UK. He made a lasting and distinctive contribution to our country and indeed to global security. Today my thoughts and those of the whole of MI6 are with Alex's loved ones."
the frontier labs don’t have “comms problems”. reality right now has a comms problem. what is happening is a little scary and there’s no nice words anyone could say, especially not those profiting from it, that’ll make it feel that much better
We're expanding Glasswing today. To solve such a big/complex/urgent problem, we need Mythos-level capabilities in as many defenders' hands as possible. That's why we're working on safeguards to scale that safely ASAP.
11 of my reflections from the past 2 months of Glasswing 🧵:
BREAKING: Bernie Sanders will introduce a bill to have the public take a 50% ownership stake in the country's biggest AI companies.
The American AI Sovereign Wealth Fund Act would have the government tax AI companies, take 50% of the stock, and put it under public control.
Hello, we are Jonathan and Abigail - unashamed pedants who want to bring this affliction to bear on all things public policy and practice.
We believe that details matter, especially in public administration. This is why today we are founding quibble: a campaign to fix the small stuff.
Think, for example, about the cookie banner that we click on every webpage. Each instance is not a big deal, so we just put up with it. But its cumulative impact adds up - on average we press it 5 times per day. The European Commission estimates that it costs EU citizens 343 million hours per year.
And who is there to represent the impacts of seemingly minor issues like this in a systematic way? We want quibble to be the answer. In the case of the cookie banner, lots of advocacy has rightly focused on privacy, but has this meant that user experience has taken a backseat? We believe there are ways to improve user experience without compromising on privacy. We will share more about this soon.
Consider another example. Did you know that in some government-run car parks you can be fined for a minor keying error, such as accidentally typing a zero instead of an “o”? Again, we will come to the detail of this quibble in the coming weeks, but for now just consider again the question: who? Who is there currently to systematically represent the interests of the parker who is given an unfair ticket?
An inherent feature of consumer interests is that those who have them rarely have enough other things in common to make collective organisation and representation feasible. This is the gap that quibble seeks to fill. Now of course excellent consumer interest groups exist. But understandably quibbles might not be at the top of their lists. Our hope is that quibble will be complementary; picking up the bottom-of-the-list issues faced by various groups - the stuff they are almost too embarrassed to raise because they are too small.
We are not embarrassed about detail. If you’ve ever had a splinter, you know small things can have a big impact. This is what quibble is committed to tackling, and our wider hope is that by doing so we will also incentivise policy makers to be even more careful about detail.
Check out our website here, including our first four campaigns: https://t.co/gZiqqHbhIL
The Empire State Building shines red and white tonight in celebration of @Arsenal’s Premier League Title and trophy celebration.
See the lights live: https://t.co/iavtXSm3Fx
🇬🇧 Every British river. 🌊🇬🇧
Has a name older than English. Older than Rome. You still say it.
The Thames. The Romans wrote it as Tamesis. But the name they wrote was already old when they arrived.
A pre-Celtic name passed to the Celts, passed to Rome, passed to us. The name has changed only in the shape of the sound.
The Severn. The Welsh called her Sabrina. A river goddess in the Brittonic tongue. And the Severn still carries her name today.
🏞️ The Trent. The Celts called it Trisanton. A name meaning the trespasser. The river that bursts its banks. And it still bursts its banks.
The Avon. The word means river. The Britons called every river the Avon. The English kept the name.
The Tyne. A Brittonic name meaning the flowing one. The Dee. A name meaning the goddess, the holy one. The Britons named her sacred and the English left her sacred.
The Anglo-Saxons came. They renamed villages. They renamed hills. They renamed almost everything they could. But they did not rename the rivers.
The rivers were too holy. The names were too rooted.
And so the Brittonic words stayed in English mouths.
The Britons did not vanish. Their words did not vanish. Their descendants became the British. And the British still name the river the same way. Every time.
🇬🇧 British people speak a language older than English. Every day. Without noticing. The Britons named the water. The British still call it the same.
━━━━━━━━━━━━━━
The river names are not relics.
The villages changed names. The rivers kept theirs.
Help us pass our history downstream. 👇🙏
👉 https://t.co/rih7iKwnvf 👈
Be part of us. ☝️🇬🇧
Be Proud Of Us. 🙏🇬🇧
I have learned the trick is to just stop caring about this stuff. Why bother? No one in serious positions of power takes it as seriously as we do. Not worth expending energy or capacity on.
Superb data journalism here which exposes the failings of the UK visa system.
In no world do the UK public expect vape shops and takeaways to be sponsoring visas for "skilled workers".
The number of Arsenal shirts on show in Brooklyn this morning is making me feel like I've woken up on the alternate earth where there wasn't an American Revolution
I didn't cover Claude Opus 4.8 on my pod because I don't think it's MEANINGFULLY better than GPT 5.5 as of May 29th.
We're entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone was a genuine leap? Now it's a slightly better camera and you can't really tell the difference. That's where models are heading. 4.6 to 4.7 to 4.8. Each one is a little different. Nobody can agree if it's better or worse. The benchmarks say one thing, the vibes say another.
The thing that actually matters right now is what's happening around the models. Claude Code shipped dynamic workflows this same week and that genuinely changes what one person can build.
Codex shipped a desktop app with an in app browser that combines coding and knowledge work in one surface. Those are the releases that move the needle for people. The model underneath is becoming interchangeable.
I think we're maybe 6 months from nobody caring which model they're using the way nobody cares which engine is in their Uber. You just want to get where you're going.
When something genuinely changes the game for builders, I'll cover it on @startupideaspod. Opus 4.8 wasn't that. Dynamic workflows was.
I'd rather save you the hour.
I wrote our cover story this week, a valedictory essay on the changes in war & warfare over my eight years as defence editor. It’s a reflection on the growth & limits of battlefield transparency, the lessons from different wars & the utility of force today https://t.co/5mBbcRGC3A
I ended up not saying much on AI in this essay, but @KatrinaManson's recent book on Project Maven is an excellent account, and @kennethpayne01 has done some of the most interesting forward-thinking work on AI in this field incl the different personalities of LLMs (https://t.co/s2SijUcbXm). I've also been fascinated by LLM forecasting, and @kpd_musing's work in that area (https://t.co/xwU1C4a54q). Kateryna Bondar (https://t.co/hXk6zBVJyh) has written interesting things in the Russia-Ukraine context.
Laura Gilbert built 10 Data Science and the Incubator for AI, which have both pioneered the deployment of technology across the British state. Few people have contributed more to improving the UK’s state capacity in AI than Laura and the teams she was a part of.
10DS modelling reportedly informed the choice to prioritise by age rather than occupation. This is widely credited with saving lives versus the occupation-based alternative being lobbied for at the time.
10DS and the brilliant folks there did a lot of other great stuff, including building live COVID data that policy teams and the public relied on, as well as releasing a lightweight data sharing tool on GitHub where anyone can access it free of charge. Today, it has amassed over 200,000 public downloads, used by teams across government and industry to make data sharing easy.
Onto AI, where Zack has suggested that Laura doesn’t have lots to offer to public discussion. Whenever I speak with frontier labs, they tell me the U.K. now has the most ambitious and sophisticated approach to deploying AI in public services.
With Extract (which Laura’s Incubator for AI team developed), planning documents are now converted into digital records in 40 seconds, versus the 1–2 hours of planner time it typically takes manually, with higher accuracy. That’s roughly a 100–180x speedup, and is contributing to a 45% reduction in processing time to build the housing and infrastructure the UK is sorely in need of.
The public sector team who built Extract scaffolded Gemini so that it could orchestrate Segment Anything and pose estimation models to map geospatial information from text and diagrams in a way that even the GOOGLE DEEPMIND TEAM hadn’t worked out how to do at the time.
So rather than outsourcing to big tech, which I’m sure Zack and many others are more than sceptical of, Laura helped build true public sector state capacity that reduced our reliance on the private sector, while also delivering a world class public service.
Powerful AI systems are going to usher in a centuries worth of social and economic transformation within only a couple of decades. This requires a deep analysis of where capabilities will develop, an understanding of which externalities we want to mitigate, a vision of what a good life looks like, and amassing the people, tools, infrastructure, and institutions to build that vision.
Of course Laura is precisely the sort of person that has much to offer in answering these questions. We should be cherishing the tireless civil servants and incredible technical talent that have built capabilities that many folks think the public sector would never be able to do.
These numbers are wild. Even if the UK closes its borders tomorrow, it will already have embarked on a demographic revolution that will be more radical than AI or geopolitical realignment or any other trend featured in Tony Blair’s encyclical.