Banger paper from Google DeepMind on the missing layer of AGI just flipped the entire “AI safety is about averages” narrative on its head.
Most people still think safety is about how a model behaves most of the time.
This paper shows why that intuition breaks the moment systems scale.
DeepMind frames AGI safety as a distributional problem, not a checklist problem. What matters is not the average outcome, but the shape of the tail. Rare behaviors. Edge cases. Low-probability failures that only show up when a system is deployed millions of times.
A model can look safe in benchmarks, red-team tests, and controlled demos, and still be dangerous once it leaves the lab.
Because deployment doesn’t sample “typical” situations. It samples everything.
- Unusual users.
- Weird environments.
- Misaligned incentives.
- Adversarial feedback loops.
- Corner cases nobody designed for.
At scale, those corner cases stop being rare. They become guaranteed.
The paper’s core insight is uncomfortable: progress can reduce visible failures while increasing real risk. If capability grows faster than tail control, safety metrics improve and danger quietly compounds.
Two systems can have identical average behavior and radically different worst-case outcomes. Current evaluations cannot see that difference.
This also breaks a common governance assumption. You cannot certify AGI safety with finite tests when the risk lives in distribution shift. You are never testing the system you actually deploy. You are sampling from a future you don’t control.
The implication is sharp.
AGI safety is not a model property. It is a systems property.
It depends on deployment, incentives, monitoring, and how much tail risk society is willing to absorb.
This paper doesn’t offer comfort. It removes it.
The real question is no longer “does the model usually behave well?”
It’s “what happens when it doesn’t, and how often is that allowed to happen before scale makes it unacceptable?”
🚨 NEW LABS EXPERIMENT 🚨
Introducing CC, an experimental AI productivity agent in Gmail. Get a “Your Day Ahead” briefing every morning in your inbox and email CC anytime for help.
Sign up for early access in the US & Canada. We’ll be starting with Google AI Ultra and paid subscribers. ⬇️
https://t.co/6Id5GRBrVc
Hallelujah, some AI researchers are adopting a pragmatic approach to the “can AI be conscious” debate! I’ve long suspected that “conscious” is a pragmatic tool we use to mean “this thing should be in our moral circle,” so we won't *discover* if AI is conscious — we’ll *decide* it
Our TPUs are headed to space!
Inspired by our history of moonshots, from quantum computing to autonomous driving, Project Suncatcher is exploring how we could one day build scalable ML compute systems in space, harnessing more of the sun’s power (which emits more power than 100 trillion times humanity’s total electricity production).
Like any moonshot, it’s going to require us to solve a lot of complex engineering challenges. Early research shows our Trillium-generation TPUs (our tensor processing units, purpose-built for AI) survived without damage when tested in a particle accelerator to simulate low-earth orbit levels of radiation. However, significant challenges still remain like thermal management and on-orbit system reliability.
More testing and breakthroughs will be needed as we count down to launch two prototype satellites with @planet by early 2027, our next milestone of many. Excited for us to be a part of all the innovation happening in (this) space!
[1/9] Excited to share our new paper "A Pragmatic View of AI Personhood" published today. We feel this topic is timely, and rapidly growing in importance as AI becomes agentic, as AI agents integrate further into the economy, and as more and more users encounter AI.
Happy to share a new preprint:
Virtual Agent Economies
https://t.co/ftWETUvz6Z
where we discuss a number of possible frameworks for establishing steerable agent markets.
The rapid adoption of AI agents points to a future where AI agents may be able to produce economic value independently of human labor. Coupled with the development of new interoperability standards like the Agent2Agent (A2A) and Model Context Protocol (MCP), this signals the inevitable emergence of a new economic layer.
The arising virtual (sandbox) AI agent economy may offer us opportunities for insulation and safeguarding, as well as establishing potentially unprecedented coordination between agents, and orchestrating their interactions towards achieving major societal or community goals, or better aligning with user preferences. Market-based mechanisms like auctions may also be employed for fair resource allocation.
Finally, we outline the technical and governance infrastructure—such as verifiable credentials for establishing trust—required to safely and robustly scale agentic AI deployments. These are necessary to address systemic market risks, and prevent exacerbating inequalities.
Written up together with an amazing group of colleagues:
@sindero@jzl86@IasonGabriel@FranklinMatija@WilCunningham and Julian Jacobs
Collectives can be more than the sum of their parts.
This is baked into human intelligence because human intelligence emerged from collective cultural evolution.
We have still not cracked this at the cutting edge of AI. Watch this space.
Do you have a PhD (or equivalent) or will have one in the coming months (i.e. 2-3 months away from graduating)? Do you want to help build open-ended agents that help humans do humans things better, rather than replace them? We're hiring 1-2 Research Scientists! Check the 🧵👇
Introducing Concordia 2.0, an update to our library for building multi-actor LLM simulations!! 🚀
We view multi-actor generative AI as a game engine. The new version is built on a flexible Entity-Component architecture, inspired by modern game development.
I once hoped discrimination was a dying relic. But what if it's a cognitive "bug"?
At @GoogleDeepMind , we asked a critical question: Could bias emerge on its own, even in AI without any human social baggage?
Our new research in @PNASNews has some startling answers. 🧵
Okay, my favorite AI paper of the year so far has arrived. In this new gem, researchers point out something I think should be obvious but seems not to be for many people working on AI safety:
There is no One True Answer when it comes to morality!
https://t.co/6soAgFO5bl
The Conservatives @CPC_HQ 🇨🇦plan to "Reduce Funding for Artificial Intelligence Initiatives" by $2.275B between 2025-2029.
Conclude however you want!
(You can find it on page 27 of their platform: https://t.co/7BAOL47klg)
#cdnpoli
Announcing Simulation Streams: a programming paradigm for consistent, long-running LLM simulations & agentic workflows! Social games, RL benchmarks, market economies (1000s of iterations!). Paper & code: https://t.co/43LetSZHxw https://t.co/1TkPFvldhU
Very happy to announce the publication of our latest paper:
A theory of appropriateness with applications to generative artificial intelligence
https://t.co/P5RuDaSfjW
And happy new year everyone!
Since several articles arguing that scaling is slowing down are appearing,
OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI https://t.co/fxNEaV5ZOA
it's a good time to post our paper on this again:
A social path to human-like ai
https://t.co/xhzJvzm9DW