Pope Leo XIV’s address in English at the publication of his Encyclical Letter Magnifica humanitas, on safeguarding the human person in the age of Artificial Intelligence.
Do listen to all of it. It is very good.
Same here.
By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed.
Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good.
After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.
Governing AI requires international agreements, but cooperation can be risky if there’s no basis for trust.
Our new report looks at how to verify compliance with AI agreements without sacrificing national security.
This is neither impossible nor trivial.🧵
1/
.@benharack argues that AI verification through the use of cryptographic tools like confidential computing may enable oversight and good governance without exposing industry secrets.
🚀New paper: "Chain-of-Thought Hijacking"!
We found a universal jailbreak in Reasoning Models and worked with frontier labs to fix them!
Our attack achieves >94% attack success rate against ALL leading proprietary models.🤯
1/7
🚨New AI Safety Course @aims_oxford!
I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at @UniofOxford
what to expect 👇
https://t.co/r9YHS3XJhR
I started this work as a verification skeptic. But, being able to signal benignness (as @Miles_Brundage puts it) will likely be important in both national and foreign policy contexts. Happy to have been a small part of this massive undertaking by @BenHarack.
The future of AI governance may hinge on our ability to develop trusted and effective ways to make credible claims about AI systems. This new report expands our understanding of the verification challenge and maps out compelling areas for further work. ⬇️