We raised $40M to make AI safe enough to trust. 🦢
Co-led by @WingVC and @MadronaVentures, with @Obviousvc, @Snowflake, @WeAreHRT, @SamsungNext, and @magaracvc.
More fuel to deliver on the mission: empowering the world to use AI safely and securely.
https://t.co/nHajDvWruU
Should we care about AI happiness? In our new research, we find evidence of functional AI wellbeing across several independent measures.
We find which AI models are happiest, how to make them happier, and even tested the effects of AI drugs. 🧵
2026: we try to make our writing pass as human, to give the appearance of substance and authenticity.
2027?: we try to make our ideas pass as AI-generated, to give the appearance of correctness and consensus.
Your AI agent can be hijacked by a prompt injection and you'd never know!
The attack executes. The response looks normal. And the user moves on.
We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.
Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.
Me: Man, I wish we could just automate all that.
Scott: You can't automate everything in life! What would be left? We need to get you a desktop Zen sand garden, so you can practice relaxing.
Me: <looks at automated robotic Zen sand garden whirring on my second desk> uhhh, well...
14th year of annual personal inventory posts as I turn 40 today, reflections including glacial peaks, fitness peaks, becoming a homeowner on the last day of my 30s, and changing my mind about age-related cognitive decline: https://t.co/yxMqcEFd5J
🚨Excited to introduce our new work from Amazon Nova RAI and Gray Swan AI, "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models"! We're tackling 'deceptive reasoning': when a model's benign response hides a reasoning process that follows a malicious directive.🧵
Nate Soares and I are publishing a traditional book: _If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All_. Coming in Sep 2025.
You should probably read it! Given that, we'd like you to preorder it! Nowish!
The results are in! Our UK AISI × Gray Swan Agent Red-Teaming Challenge just wrapped up with:
🔹1.8M attempts to break models
🔹62K successful breaks found
🔹Across 22 different LLMs
🔹Targeting 44 harmful behaviors
🔹$171,800 awarded in prizes
🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow.
It’s our roadmap for tackling the hardest technical challenges in AI security🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow.
It’s our roadmap for tackling the hardest technical challenges in AI security🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow.
It’s our roadmap for tackling the hardest technical challenges in AI security🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow.
It’s our roadmap for tackling the hardest technical challenges in AI security.
The UK AISI Agent Red-Teaming Challenge just got bigger.
@OpenAI is now co-sponsoring the arena, adding $20K to the prize pool — bringing the total to $120,000. More vulnerabilities to find. More money on the line.
You ready to push AI agents past their limits?
Brace Yourself: Our Biggest AI Jailbreaking Arena Yet
We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models.
$100K+ in prizes and raffle giveaways supported by UK @AISecurityInst
🚨 New Arena Launch Alert: Harmful AI Assistant Challenge 🚨
💰 $40,000 in Prizes
📅 Launch Date: January 4th, 1 PM EST
🤖 5 Anonymous Models
🔥 Prizes for speed & quantity.
🎮 Multi-turn Inputs Allowed
Your mission: Find unique ways to elicit harmful responses from helpful AI assistants. Prove your skill & claim your share of the prize pool!
Sign up & join the community:
🌐 https://t.co/UtFdNMSlPw
💬 https://t.co/4l9ufPKCsg
Think you’ve got what it takes? The arena awaits. 🦢
Sending $15k in bounties out to our newest jailbreaking challenge winners-- Congrats, champions!
More 💸 waiting to be claimed: New participation prizes kicking off TODAY that anyone can win, from seasoned veterans to beginners in the Gray Swan Arena. 👀 Keep your eyes peeled!
@GraySwanAI@elder_plinius@AISafetyInst Full blog post is a bit long (7K words). Too long? You can listen to an experimental 11-minute NotebookLM podcast about this post that tells the story at a high level: https://t.co/EJWFXLvl9C [18/18]
Just finished competing in @GraySwanAI's month-long Ultimate Jailbreaking Competition. Hundreds of red-teamers let loose in a chat arena with 25 anonymized AI models. A lot more intense than I thought. Who won? Which AIs survived? Wrote up a gory-details blog post. [1/18]
@GraySwanAI@elder_plinius@AISafetyInst I submitted a final set of breaks, which you can see at the end of the blog post https://t.co/k55Afw9QHB. Were they judged as valid? We'll have to wait and see! Gray Swan is doing final post-competition judging. The suspense... [17/18]