Introducing AI StopWatch.
It’s easier to speak up when you know what’s going on. It’s easier to see what’s coming when you know what’s already here.
What is now AI StopWatch began as an internal newsletter at @MIRIBerkeley. It helped us to not just keep up with the technology itself, but with what people outside our bubble were saying about it. As an experiment, we thought we’d try to share.
At https://t.co/sR4mD3NGA9, we post updates and commentary about AI, and the discussions around it, seven days a week. We’ve worked hard to make a site we are excited to share with everyone — our moms and our neighbors, not just our fellow news junkies and tech geeks.
Subscribe to our Notes feed to read our dispatches one at a time, the moment we finish them. Or wait for us to compile them in the Daily Digest at the end of the workday. Or catch them all in the Podcast version a few hours later. Have any or all of these delivered to your inbox.
Many who thought they knew MIRI best will be surprised that so much of our coverage has no direct connection to the threat of human extinction. While extinction remains our special focus — the problem that must be addressed no matter what other urgent issues AI brings — this project has helped us appreciate that few people, if any, will have the luxury of worrying only about extinction. Not even us.
So come look at cool/scary robot videos with us. Gripe with us about scams and slop. Fret with us about our kids’ education. Swap tips with us about vibecoding, and try to outguess us about orbital data centers. The problems and possibilities all serve to remind us how much we have to protect, and they make for great conversation starters. We think you’ll come away more concerned about extinction, not less, and that you’ll feel more equipped to talk about it.
Yes, we believe the current situation with AI is very dangerous. But we don’t think it’s hopeless! When enough people realize that the race to artificial superintelligence is real, lethal, and preventable, stopping it might not even be that hard.
So we hope you’ll join us on our Watch. Where there’s life, there’s hope.
"You can't trust the labs with your private information. You sure can't trust them with your life."
Hackers seized high-profile Instagram accounts by politely asking Meta's support bot for access.
Donald Gauvreau on what that should teach us👇
"This move surprised no one."
Anthropic has filed for a roughly trillion-dollar IPO, expected this fall.
The filing was confidential, so the news was thin. The company's footprint on the agent economy is anything but.
Mitchell Howe in yesterday’s Digest 👇
It’s a process for removing the guardrails from AI models. Abliterated models don’t refuse prompts, making them ideal for assistance in making drugs, explosives, or pornography, among other things. NPR’s Huo Jingnan today tours the forums where you can download, for free, pre-abliterated models or the tools to do your own abliteration (which may only take minutes and a few hundred dollars).
The catch: You can only abliterate a model if you have its weights.
As a refresher, the weights are the huge pile of numbers tweaked during training by an automated algorithm; they determine how the AI converts inputs to outputs. If you have these weights, you can host copies of the AI on your own hardware. You can also subject the AI to further training that will modify those weights, including removing behavioral guardrails.
That would be a pain, though. Abliteration is much simpler: Through prompts intended to provoke refusals, tools identify the pattern of weight activations correlated with refusal and mathematically cancel it out. Presto!
But again, you need the weights. The weights for the chatbots you are most likely to have interacted with on purpose — ChatGPT, Claude, Gemini, Grok — have not been shared (though there’s always a risk they might be stolen, or that the AI itself may find a way to sneak them out). But many of the cheaper customer service bots, fake social media accounts, and scammer bots you run into are likely to be running on open-weights models, where the weights have been shared on purpose.
Who’s releasing their model weights on purpose? Most of the big AI companies, including OpenAI, Google, and Meta, share open-weights models on the side; these are usually smaller models that can be run outside of a data center and aren’t directly competitive with their prestige offerings. The Chinese AI giants, on the other hand, often share the weights for their best models.
Why do they share these weights? If you’re an AI company behind the frontier, there’s an incentive to erode the market share of your more advanced competitors for every use case that doesn’t require frontier capabilities. Sharing the weights for these lesser capabilities destroys the profit margins for everyone serving them.
Sharing weights also tends to generate goodwill and press in science and tech circles. Many who work at AI labs come from a culture that rewards sharing one’s work.
POLITICO has reported that members of the U.S. House of Representatives were given a demonstration of abliterated open-weights models last month, and were duly concerned. If Congress wants to act, it should do it sooner, rather than later. The AIs are only going to get more capable, and there are no take-backs on shared weights.
Read more: https://t.co/92TcGqjxyH
This could be uncomfortable. It will definitely be weird. Perhaps that’s why people so rarely do it.
I saw a slew of stories this week where writers were exposed to actual thoughts of this type — a rare and precious resource — and their response was to wave them off as pseudo-religious fancy, or to critique some specific policy proposal motivated by them.
I saw the weaker version of this on display Friday, when an op-ed by the Wall Street Journal editorial board ripped into California governor Gavin Newsom’s new executive order. The order directs the state to prepare workers for AI disruption. Among other things, this includes exploring policy concepts around “universal basic capital,” usually described as the government distributing equity in AI companies to the public so that the technology’s gains are shared by the workers it displaces.
To the Journal’s board, this is just “socialism by a more politically palatable name,” and any redistribution scheme would be doomed to replicate the economic malaise of Europe.
I could get behind the Journal’s political-economic critique if we were talking about a technology on par with the personal computer or the internet, but the board seems to understand that AI is bigger than this. It concedes that Newsom “is recognizing the disruption from AI and trying to address it,” and that Republicans need to “do far more to explain to Americans the great change AI will bring.” Great in a way that means the disruption requires no policy changes? I’m confused by the lack of less-socialist-flavored alternatives to Newsom’s proposals, in this op-ed and elsewhere. If AI is going to liberate us from work, how are we supposed to eat?
The stronger version of implication avoidance can be seen today in the latest profile of Silicon Valley transhumanism, by The Guardian’s Eduardo Porter. By painting a contrast between the (scare-quotes) “transhuman” future and “actual humanity,” he implies that claims by Sam Altman that we may “design our own descendants” are physically impossible.
This is like watching the Wright Brothers soar overhead, understanding that aircraft will get larger and faster, but dismissing any talk of them ever being used to ferry passengers or drop explosives.
An AI that has cracked cellular biology and genetics well enough to cure cancer and other diseases can and will be turned to the processes behind aging, reproduction, human intelligence, and everything else governed by these same processes. You might feel like it shouldn’t be used for that — just as you might have disliked the thought of airplanes used to drop bombs in 1903 — but the incentives will make this inevitable. There’s nothing religious or fanciful about it. It would be wise to argue about the should instead of dismissing the could…
Read the full dispatch: https://t.co/yu0AzmpxYg
Grappling (or not) with the implications
Even setting aside the part where superhuman AI more than likely forces our extinction, if you believe AI that broadly matches or exceeds human capabilities could be just around the corner, you should stop and think about what this actually means.
"Are you, or have you ever been, opposed to the race for artificial superintelligence?" is not the sort of question that should be used to brand someone a traitor.
Mitchell Howe on the new "anti-tech violent extremism" threat category, in today's Digest.
Sam Altman now says he's "delighted to be wrong" about the jobs apocalypse "that some of the companies in our space advocate or talk about."
With OpenAI's IPO approaching and public opinion souring, are the labs walking back their predictions?
Read about this and more in the Daily Digest.
"In steps that all seemed plenty reasonable at the time, in twenty different scenarios, the AIs slowly escalated until the nukes flew."
What happens in a simulated nuclear crisis, when all of the players are AI?
Read about this and more in the Daily Digest
Ten places where Magnifica Humanitas matters for AI.
At 42k words long, Pope Leo XIV’s new encyclical has a lot to say. In our most recent Digest, Mitchell Howe outlines the parts which might be the most impactful.
Pope Leo XIV called for robust regulation of artificial intelligence and for its developers to work for the common good rather than profit, issuing a sweeping manifesto on safeguarding humankind as the technology impacts everything from work to war. https://t.co/VSlFOkxhry
What will be the impact of AI industry super PACs?
"The takeaway here is that this year’s U.S. midterm elections are being aggressively shaped by different factions of the AI industry sometimes supporting the same candidates, sometimes different candidates, buying ads that don’t have anything to do with AI."
An internal model at OpenAI has autonomously disproved a central conjecture in discrete geometry, a mathematical field with applications in cryptography, wireless device communication, and medical imaging. The proof relates to a famous question posed by Paul Erdős in 1946. It has been verified by prominent mathematicians in a companion paper.
The verifying mathematicians consider this to be a genuinely novel breakthrough on one of the most discussed problems in this area of mathematics. One called it “arguably the best known problem in Discrete Geometry.” Another observed, “If a human had written the paper and submitted it to the Annals of Mathematics and I had been asked for a quick opinion, I would have recommended acceptance without any hesitation. No previous AI-generated proof has come close to that.”
The proof illustrates a general trend towards autonomous, agentic problem-solving in AI systems. OpenAI describes the system that produced the proof as a general-purpose model not specialized in mathematics. AIs can now perform long, novel chains of reasoning on difficult problems and are beginning to outstrip our ability to measure their progress.
AI agents still perform best in domains with easily verifiable outputs, such as mathematics and cybersecurity. For example, Anthropic's Claude Mythos found thousands of vulnerabilities across every major operating system and web browser, and was deemed too dangerous for public release. Such capabilities are why the government is now more interested in evaluating frontier AI models.
AI research is also a field with many easily verifiable outputs. Researchers at OpenAI and Anthropic take advantage of this fact to accelerate their work; senior researchers now claim they make only high-level decisions and let AI handle most of the coding. Experimenting with the coding capabilities of a publicly available AI system, like Claude Code, immediately demonstrates how far AI has come in the last year.
OpenAI and Anthropic intend to use AI to enhance future models with minimal human oversight. To justify the urgency, these companies cite the importance of beating rival U.S. or Chinese labs. Many of the field’s foremost experts warn that this race ends with human extinction.
Policymakers and researchers, including the founders of the AI revolution, are calling for international restrictions on the technology. A growing bipartisan and international consensus of political leaders agree.