As increasingly capable AI systems are deployed, humans, institutions, and other AI systems adapt in response — i.e. the world pushes back.
So is capability still the central safety challenge for AI?
We think not. We believe the harder challenge is coexistence.
The current AI research paradigm treats the world as a stationary source of feedback, what we refer to as the solipsistic approach to AI design. This raises serious risks for coexistence.
In our new #ICML2026 paper, we argue that superintelligence — an extremely capable task solver, built through such a solipsistic approach — is unlikely to be cooperative. 🧵
There are really interesting academic questions emerging around AI and epistemic risks. I only fear that, by the time we reach consensus, we will be too dumb to understand it.
People do not coordinate only through broad legal rules and prices. Hayek emphasized abstract rules that allow people to coordinate like property, contract, trade, and competition. But Lachmann also emphasized the practical secondary institutions people orient their plans around, like banks, standardized contracts, product categories, and so on.
In software, an abstraction boundary is an interface that hides complexity hiding beneath. In this 2005 paper (https://t.co/FPxYCFUSPT), Miller and Tulloh explain that you can apply this concept to markets too. Consider a post office: there's an abstract boundary that separates why the customer wants to mail something (which the postman doesn't need to know); what the shared transaction is (all the recognizable steps and commitments involved in sending mail); and how the postal system actually delivers it (complex logistics network hidden from the customer).
The middle part is what lets the user benefit from the postal system’s expertise without having to learn postal logistics. The boundary defines the shared 'what' but also separates the customer’s 'why' from the provider’s 'how'. Not only that, but reusability means the same institution can be used to satisfy many purposes (birthday invites, subpoenas etc) and polymorphism means different providers can satisfy the same need and compete (UPS, FedEx etc).
An important question in institutional theory is how societies achieve both stability and adaptation; the paper authors say that the solution is stable interfaces allow changing internals. I find this very intuitive: when companies don't evolve/change from the inside much, you get ossification and insufficient adaptation. When laws change too much and institutions are unstable, uncertainty affects market confidence.
The people who are good at redrawing abstraction boundaries are entrepreneurs, who notice when existing categories are wrong and will invent new ones to remedy faults or address demand. What has always saddened me is how poorly rewarded and incentivized political entrepreneurship is. Part of the reason why is that this is hard: market abstraction boundaries are often disciplined by exit, entry, profit/loss, customer choice, and provider competition - but these feedback loops are much weaker in the public sector.
I hope we'll see a lot more of this in the coming decade. In fact this is something that AI will hugely facilitate, since it can lower the cost of articulating and prototyping new abstraction boundaries. We've already seen minor examples through e.g. citizens creating websites/services that compete with government ones. Though usually this is to make state services more legible rather than changing the boundaries in the first place.
I think if people want the future to go well, bolstering state capacity and enabling more innovation on the governance/democracy side of things will be critical. People don't really like this because it's a slow process, but I think they're wrong (and cheems), and playing the 'urgency of AGI' card to bypass this through a de facto state of emergency will cause lasting harms, partly by weakening institutional learning, public trust, and future coordination capacity.
How does democratic accountability work if institutions are run by agents? Join @bakkermichiel (@MIT) for his seminar on Tuesday 16 June exploring 'Closing the Democratic Loop: Automated Oversight for the AGI Era'. Link below.
📄 Paper: https://t.co/0OsNwFqggP
Work done in collaboration with my wonderful coauthors @natashajaques, @locross, Sasha Vezhnevets, and @jzl86.
Very excited to present this at #ICML 2026. If you are visiting, come say hi at our poster session. We would love to discuss!
As increasingly capable AI systems are deployed, humans, institutions, and other AI systems adapt in response — i.e. the world pushes back.
So is capability still the central safety challenge for AI?
We think not. We believe the harder challenge is coexistence.
The current AI research paradigm treats the world as a stationary source of feedback, what we refer to as the solipsistic approach to AI design. This raises serious risks for coexistence.
In our new #ICML2026 paper, we argue that superintelligence — an extremely capable task solver, built through such a solipsistic approach — is unlikely to be cooperative. 🧵
The paper concludes by tackling several counterarguments such as:
- multi-actor designs may have worse failure modes
- competitive pressure may produce cooperation naturally
- the empirical track record may not justify alarm
- scale may solve interaction dynamics
- RLHF may already train cooperative behavior
These are serious objections. Our response is that each misses how deployment changes the game.
12/n
🚨New paper led by @aribak02
Lots of prior research has assumed that LLMs have stable preferences, align with coherent principles, or can be steered to represent specific worldviews. No ❌, no ❌, and definitely no ❌. We need to be careful not to anthropomorphize LLMs too much.
Sim agents are key for developing autonomous systems for safety-critical systems, like self-driving cars.
We're open-sourcing sim agents that achieve a 99.8% success rate with < 0.8% failures on the Waymo Dataset. These agents are built through scaling self-play.
The development and widespread deployment of advanced AI agents will give rise to multi-agent systems of unprecedented complexity. A new report from staff at CAIF and a host of leading researchers explores the novel and under-appreciated risks these systems pose. Details below.
In this review paper, we advocate for the normalization of AI safety as an inherent component of AI development and deployment. AI safety should be a standard practice integrated into every stage of AI creation and deployment. Developing and deploying safe AI should be a universal priority for everyone. Read our preprint here: https://t.co/57lWzoFuWm