I am excited to announce this funding call, in collaboration with @schmidtsciences, @coop_ai@ARIA_research and @Googleorg
As we are thinking about moving beyond individual powerful AI agents towards large-scale agentic collectives that can communicate and coordinate towards completing long-horizon complex tasks - it is paramount to design these future agentic societies safely.
Excited that @ARIA_research's Scaling Trust is co-launching this $10m funding call on safety and security for multi-agent multi-principal systems @GoogleDeepMind, @coop_ai, @schmidtsciences and @Googleorg ⚡️
If you work on testbeds for agent ecosystems, the science of how collective capabilities emerge (and fail), trustworthy agent-to-agent interactions, or oversight of agent populations at scale — apply! Grants up to $1M, deadline Aug 8.
Shoutout to @sebkrier@lrhammond@James_D_Fox@weballergy@FranklinMatija@HaleSirin_@iamnotnicola, @MjaBradshaw and everyone else involved for their partnership so far, and excited for what's ahead!
Read more details below, link in replies:
AI agents are increasingly being deployed in multi-agent settings. While most present-day cases involve teams of agents orchestrated by a single actor (or ‘principal’), we are beginning to see the emergence of more complex ecosystems of agents deployed by different actors across shared digital infrastructure. These multi-principal, multi-agent interactions create new opportunities for cooperation and shared benefit, but also new risks, which means focusing only on the safety and alignment of individual models is insufficient.
More research is therefore urgently needed to understand safety and risk through a system-level, multi-agent lens – developing methods to analyse emergent collective dynamics, building infrastructure for trustworthy interaction between agents, and creating scalable approaches for monitoring and control of increasingly complex networks of AI systems. While some of these problems will be addressed by market forces, we expect others to fall through the gaps. This funding call aims to fill those gaps, catalysing the foundational scientific research needed to understand, evaluate, and control risks emerging from large-scale ecosystems of interacting AI agents, deployed by multiple actors.
The call has been inspired by three recent papers. First, Google DeepMind’s “Distributional AGI Safety” outlines the safety implications of highly capable AI systems emerging not as single monolithic agents, but through coordinated networks of specialised sub-AGI systems with differential access to tools, data, memory, and resources. Second, ARIA’s “Scaling Trust” programme thesis argues that, in a world of increasingly capable networked agents acting across digital and physical environments, coordination infrastructure that lets agents enter into 'contracts' securely, programmatically, at scale, and without intermediaries can preserve pluralism and unlock new forms of coordination. Finally, the Cooperative AI Foundation’s “Multi-Agent Risks from Advanced AI” report argues that interacting populations of AI agents introduce qualitatively new failure modes beyond single-agent systems, including collusion, conflict, destabilising dynamics, emergent agency, and novel multi-agent security vulnerabilities.
When millions of AI agents interact with each other, new collective behaviors can emerge. 🌐
Together with @schmidtsciences, @coop_ai, @ARIA_research and supported by @GoogleOrg, we’re launching a $10M research fund to help understand how AI systems behave as a group. → https://t.co/mN6fZBmnmo
With @schmidtsciences, @coop_ai, @ARIA_research and @GoogleOrg, we’re launching a $10M research fund to help understand how AI systems behave as a group and to fund work in multi-agent safety.
We invite researchers to submit proposals in four priority areas:
1. Sandboxes and testbeds
2. The science of agent networks
3. Strengthening agent infrastructure
4. Oversight and control
A big thank you to everyone that was involved including @James_D_Fox, @sebkrier, @weballergy, @lrhammond, and @ObadiaAlex!
Over the past few months I've been working on a very exciting project: a new $10m fund for research on multi-agent multi-principal AGI safety! Instead of focusing on single agent alignment and centralized control, we're looking to support research focusing on multi-agent settings, mechanism design, cooperative AI, and coordination problems.
This is a joint initiative between @GoogleDeepMind, @Googleorg, @schmidtsciences, @coop_ai, and @ARIA_research. Huge thanks to @James_D_Fox, @weballergy, @FranklinMatija, @lrhammond, and @ObadiaAlex for their invaluable work!
See: https://t.co/L5351OpPqH
Apply: https://t.co/a1uJLJnfYw
The @AISecurityInst is hiring for a Director and for a Chief Research Officer. AISI is a remarkable organisation: doing globally important work, with a world-class team, in the heart of government.
These are some of the highest impact jobs in AI security anywhere. Do consider applying and sharing widely.
ARC and @aicrowdHQ are launching a ≥$100k contest for white-box estimation algorithms: given the weights of an MLP, the goal is to estimate the expected output of the network on Gaussian inputs. (Thread)
We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.
Some ways my thinking has evolved recently:
1. I'm less concerned about those who are incurious about AI as I expect them to eventually see the value and impacts over time, and I think the 'wake up sheeple' vibe is often counterproductive. On the other hand I'm more concerned by what seems to be neither full 'AI psychosis' nor exactly Eliza effect, but some weird in-between. Also a lot of affirmation by models can probably warp one's sense of epistemic humility and lead to some sort of pathological over-trust.
2. Relatedly, I'm more annoyed at the 'this time it's totally different' vibe that a lot of people adopt as it frequently mimics Schmittian 'state of exception' logic and excuses all sorts of undesirable policies and rhetoric. It's also often just a group signalling exercise. To be clear I do think it's different in important ways, but "this is a marathon, not a sprint" seems closer to the right attitude than either "nothing has changed" or "all normal reasoning and empirical work to date is suspended".
3. I think the field is still fundamentally too 'singletonian' in how it imagines intelligence, markets, and governance - but I also think I've occasionally over-emphasized the 'multi-agent'/decentralization frames. I do think the future includes many models of all sizes and types, but also economies of scale and very large corporations too. I find the whole ecology more interesting than just the frontier model. A top down single 'perfect mind/personality', intended to work across all commercial contexts, seems both inflexible and inefficient.
4. I'm more interested in the harnesses, software, agent architectures, and stuff like RLMs than I was before. I feel like a lot of weaknesses that models have, or behavioural tendencies, can be addressed more effectively through that layer (rather than through model 'internal virtue' alone). For example stuff like: https://t.co/MHG4onCbDo and https://t.co/8ibuxKYFrA
5. I think some researchers are too quick to want to defer highly consequential decision-making to models, or to think of alignment as the models internalizing "I'm afraid I can't do this, Dave" as a core protection against all sorts of ills. I think we should think carefully about *actively* creating principal-agent problems with agents that will permeate society. Delegation is not a free lunch.
6. I'm concerned about how few people think about LMICs and building the technical/institutional infrastructure there for AGI diffusion. We need fewer vague essays about “distributing the benefits of AI” and more work on reducing barriers to trade, improving state capacity, rebuilding development institutions, and making something like USAID/IMF-for-the-AGI-era actually work.
7. I used to be slightly more sympathetic to the idea, directionally - but I now think the 'permanent underclass' meme is a bit dumb. The strongest versions often assume a zero-sum view of technology and labour, a too-static view of human adaptation, a weirdly fixed mapping between today’s skills and tomorrow’s opportunities, and ignore the possibility of catch-up growth (at the nation state level). Also, as a meme among extremely rich and mobile people, it has a slightly comic self-pitying quality.
8. I'm more concerned about the lack of intellectual diversity within the frontier AI commentariat/research world. This improved a lot over the last two years, but we're still far from a healthy ecosystem. New outsiders often feel some unnecessary pressure to 'choose a camp'. Many are too unwilling to engage with domain experts merely because they're insufficiently AI-pilled (though conversely, a lot of academic groups suffer from heavy status quo bias).
We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th!
https://t.co/907HfBy7g3
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control.
The result: our first Frontier Risk Report.
Will MacAskill (@willmacaskill) on the 80k podcast talking about our new paper. It's about making AIs risk-averse as a safety strategy. Coming out soon!
https://t.co/gyqx7cQzLx
Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below!
with @AlecRad and @status_effects 🧵
Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below!
with @AlecRad and @status_effects 🧵
We're hiring for an operations lead at Truthful AI, my non-profit research organization!
- Generalist role: recruiting, fundraising, communications, and PMing to support our research
- At our office in Constellation (Berkeley, CA) preferred
- Salary is $140–200k plus benefits
.@deanwball in The Economist makes the public case for an architecture Dean and I have both been advancing: a network of independent organizations that audit AI safety claims. Worth a read. https://t.co/uzpxMOovon
New column: I went to visit @METR_Evals, the 30-person AI nonprofit that makes the Most Important Chart in the World.
I learned a lot, but the most striking thing was how soon some of them think AI R&D could be fully automated. (This year!)
https://t.co/EWnYZ7WG0p
@tmkadamcz and I started working on MirrorCode, a new long-horizon software engineering benchmark, last September. I think it’s the best benchmark for measuring AI’s ability to complete very hard (but precisely specified) software tasks—but it’s likely already saturated.
Seems there is a surprising amount of confusion / nonsense on my timeline about "Mythos vs. open models" even among people who are usually sensible (+usual twitter toxoplasma of rage). I like the work of both @AnthropicAI and @Aisle_Inc, so here are some takes:
1. Anthropic is not a household name yet, but is a big brand with a lot of visibility among people who broadly follow tech and AI. This has the effect that even if there is prior work doing something similar / comparably impressive, it usually has way less exposure. Often 10-100x more people learn about something when Anthropic publishes their version. In a variant of the Matthew effect, non-experts often assign most or all credit to Anthropic, by virtue of not being aware of prior work. And are more surprised.
Seen this multiple times in research just in the past few months: persona selection model, emotion vectors, now: impressive/scary cyber capabilities.
It is usually at least moderately annoying from the perspective of niche experts who are also impressed, but often not impressed by the same things.
2. Mythos is clearly a very impressive model with scary capabilities. "Huge discontinuity in cyber-risk" needs more subtlety in what the comparison is. If old models + minimal harness, there is a large gap. If SOTA harness/scaffolded system like what AISLE does, the difference between "raw Mythos" vs. "SOTA scaffolding + other models" can be moderate, small (or even non-existent).
I don't know, and possibly no one does right now: Anthropic likely does not have SOTA harnesses for bug finding; startups who work on this likely do not have access to Mythos.
Some evidence comes from the fact AISLE was finding & fixing hundreds of 0-day vulnerabilities in a similar weight category as what Anthropic published, often in major parts of internet infrastructure like OpenSSL or curl, and in decades-old code, all before Mythos. (And maybe ~99% of people who see Anthropic + Mythos as a step change haven't noticed this, including various highly visible pundits.)
3. My overall take is this matches the general intuition where a great harness often buys you ~up to one generation of model capability on tasks which are not really optimised during model training.
Anthropic seems to aim for RSI and the actually optimised tasks seem to be coding and ML, not vulnerability discovery. I would guess where Mythos is actually a bigger jump is automated exploit construction.
4. AISLE wrote a blog showing small models can often notice the same things, and arguing for the importance of the harness. These results are in my view interesting, although obviously the question is sensitivity AND specificity, and to what extent you can automatically eliminate false positives with further tooling. (AISLEs prior successes show you can, they are a few people + automated pipeline, not some labour-intense bug hunting)
5. The post got noticed on twitter. Various fundamentally unserious people like @ylecun took it as an opportunity to dunk on Anthropic, claiming it's all BS, hype, etc. (This is nonsense)
6. The counter-reaction was to point out limits of the AISLE blogpost, mostly based on the correct claim that finding the relevant part of the code is a large part of the problem. Toxoplasma of rage amplifying reactions and over-reactions to the most bizarre Mythos-denialism nonsense.
7. In some people's minds this grew out of proportion, taking the fact that AISLE provided the specific part of code as some sort of killer argument making AISLE's findings worthless. (Anthropic's @mooncat_is: "We took the needle the model found, isolated the relevant handful of the haystack, and then gave it to a small child, who found the needle as well.")
In fact this does not settle the question. As the AISLE original post explains, the inference cost difference is large enough that you can run the small model on every such code chunk individually. So the right comparison may be needles/$. Also: Anthropic did also split the heap into smaller chunks, running the analysis per file.
The deeper question is actually similar to some classical question about HCH: if you have a large number of people working each for 30 minutes, how does that compare to one human working for 10 hours? If you have a large number of IQ 130 humans working for long time, how does that compare to one IQ 150 human?
Where current cyber capabilities fall is ultimately an empirical question. As @boazbarak noted, people from the harness company tell you the harness is important, people from the model company tell you the model is important. I'm somewhat in between - both are important now, harnesses can compensate for ˜1 generation; yes, in the limit the models will likely just build their own scaffolding.
Also: if "cheap quantity can't compete with superior concentration of intelligence" were generally true, it would be actually very scary for AI safety: "scaling oversight" plans fundamentally depend on weaker intelligences overseeing stronger ones, compensating through effort, quantity, and orchestration.