The full paper includes an analysis of the offense-defense balance of subversion, and maps the means and motives for AI betrayal between states, within states, and within AI corporations. It’s available here: https://t.co/hUzgf42KyD
When labs trigger an intelligence explosion, they should worry about AI backdoors activating to sabotage their compute or their attempt.
In a new paper, we study AI betrayal—how adversaries can make AIs work against their developers. 🧵
AI developers that fear AI betrayal would hesitate to deploy AIs in fully autonomous, high-stakes contexts like in the military. They would be more inclined to implement safeguards, monitoring, and transparency.
We call this effect "deterrence by betrayal."
Whereas I feel great sympathy for mathematicians, whose timeless aesthetic project will no longer need them, I can't help but feel anger at economists, who in their nominal pragmatism are utterly failing their mandate to anticipate economic events and contribute to policy in time
I'm poorly calibrated on shortform view counts, but I'd guess 250M is a reasonable median for the total views @plzdontkillus will receive in July, which might make it one of the most promising public engagement projects in AI Safety this year. https://t.co/KEdFolL5MM
.@JacobSteinhardt's GPT-2030 has aged extremely well. AI math, fast mode, product-scale online learning. The essay had a large impact on my thinking in 2023, for the better. Maybe a tad conservative, but far more aggressive than discourse at the time. https://t.co/dsFuDqtdK5
AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator.
In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵
We’re p(doom), an AGI research lab. We’ll pay you $300/month to record your screen while working.
If your work is open-source and involves research, engineering, design, editing, or similar long-horizon digital work, fill out the form: https://t.co/NekbmBW6F5
@zetalyrae I somewhat disagree. Superhuman personal assistants will be viscerally impactful, but only because norms will shift to give them great affordances over our lives. Our time, finances, recommendations, etc. will be managed under one umbrella. Only with ~no affordances do I agree.