We’ve released a statement on the risk of extinction from AI.
Signatories include:
- Three Turing Award winners
- Authors of the standard textbooks on AI/DL/RL
- CEOs and Execs from OpenAI, Microsoft, Google, Google DeepMind, Anthropic
- Many more
https://t.co/mkJWhCRVwB
We are pleased to share that @MantasMazeika96, Research Scientist at CAIS, has been appointed to the European Commission’s AI Act Scientific Panel (@DigitalEU).
As a member, Mantas will advise the European AI office and national authorities on general-purpose AI (GPAI) models, as well as the implementation of the AI Act to ensure that AI is built and deployed responsibly across Europe.⬇️
Big news from @CAIS:
Devin Kim (formerly @xAI, @scale_AI) joins as President.
We're launching the @FrontierSecInst, a DC-based org bridging frontier AI and the National Security Enterprise.
Frontier AI is a national security technology. It's time to act like it. ⬇️
When labs trigger an intelligence explosion, they should worry about AI backdoors activating to sabotage their compute or their attempt.
In a new paper, we study AI betrayal—how adversaries can make AIs work against their developers. 🧵
The full paper goes deeper on why groups (such as the public) would have an incentive to subvert AI systems, how they could do it, and the offense-defense balance.
Read it here: https://t.co/CSEWzcDOCx
AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator.
In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵
The fear of AI betrayal may discourage reckless deployment, reduce confidence in fully automated systems, and make actors more willing to accept safeguards, monitoring, and transparency. We call this deterrence by betrayal.
Thank you, Pope Leo XIV, for drawing attention to the importance of moral questions in AI development. Humanity is facing a unique challenge, and it’s in our power to overcome it.
In the era of #ArtificialIntelligence, when human dignity is threatened by new forms of dehumanization, ours is the pressing duty to remain profoundly human. We must lovingly safeguard the grandeur of humanity bestowed upon us and revealed in its fullness in Christ, the splendor of which no machine can ever replace. #MagnificaHumanitas
https://t.co/6i9MWs6LJl
AI freely criticizes Christianity but refuses to criticize Islam.
AI companies have tried making models unbiased, but progress has been limited.
We show how to measure political bias, and we developed a new training method to reduce it.
Covert political manipulation is a longstanding alignment challenge that can be fixed once measured properly. See our site and paper for further results and concrete examples of subtle manipulation.
Paper: https://t.co/xs0vUb7M7I
Website: https://t.co/mqP1PUH2RI
In our latest research, we find that AIs are subtly and pervasively politically manipulative.
When we ask the same question about politically opposed topics, we find that AIs quietly favor one side.
We show how to measure covert political manipulation and how to reduce it. 🧵
To fix this, we introduce Political Consistency Training. By training models to keep sentiment and helpfulness consistent across opposed topics, our resulting open model is less manipulative than GPT, Gemini, Grok, and Claude.