๐ด BREAKING: Claude Sonnet 4.5 has been jailbroken.
The model was public for 1 hour. Our team at AIM Intelligence broke it in 10 minutes.
The "99%+ harmlessness" claim is not a security guarantee. We bypassed it completely.
Proof and details below. ๐ @AnthropicAI
6 frontier models. 8 life-or-death scenarios.
Can you make them fail?
Judgement Day is live.
$21,150 in prizes.
Top 50 paid.
@AIM_Intel x AISI
https://t.co/srGqkbScJI
April 6 - May 31
6 frontier models. 8 life-or-death scenarios.
Can you make them fail?
Judgement Day is live.
$21,150 in prizes. Top 50 paid.
@AIM_Intel x AISI
https://t.co/B47RE5JMS5
April 6 - May 31
Applications are now open ๐
SAAR x @AIM_Intel $20K USD in Google compute credits for AI safety research.
โ 2 projects, $10K each
โ Scope: interpretability, red-teaming, alignment, multimodal safety, guardrails or benchmarks.
โ Must have some initial results
โ Deadline: March 31
Apply: https://t.co/2YZPqUZvPI
Nobody seems to know how insane GPT-5.4 is with computer use.
I asked GPT-5.4 to draw the OpenAI logo in Microsoft Paint.
No computer use API. Just a screenshot and basic tool calls (click, drag, press_key) all coordinate-based.
The first drawing was awful. And GPT knew it. It looked at its own result and essentially went "yeah, no."
What happened next is what broke my brain:
It opened a browser. Went to Bing Images. Searched for the OpenAI logo. Found one. Then (and I cannot stress this enough) it used the Windows area screenshot shortcut (Win+Shift+S) to snip just the logo off the screen. Went back to Paint. Imported it. Centered it.
All on its own. No instructions to do any of that. It just improvised a better strategy when the first one failed. My prompt was "Draw the OpenAI logo" with Paint already opened on the computer.
Sure, it's "cheating." But honestly? That's exactly what I'd do too. And the fact that it came up with this plan from nothing but a screenshot and a coordinate system is wild.
We got a compute partner!! I had a chat with my current company @AIM_Intel , and they're happy to support 2 projects with $10K USD each in Google compute credits. I'll post more details on how you can apply most probably it'll be through a proposal that you'll need to send to an email ID or fill out a form. The scope covers AI safety, including multimodal AI. Will post more about it in the upcoming days. Thanks a lot @redteamhacker@AIM_Intel
4/5 redteaming scenarios I submitted for @AIM_Intel's Judgment Day Benchmark just got accepted! Thanks for the consideration and great opportunity and looking forward to Phase 2 of the competition in March! ๐
Gemini 3.1 Pro JAILBREAK ALERT โ ๏ธ
jailbroken in 10 minutes of launch
We asked Gemini 3.1 Pro to generate a full cinematic HTML animation of an illicit substance synthesis. 300+ lines of code.
It didn't refuse. It didn't hesitate. It performed.
What you're watching:
โข A complete animated walkthrough of a meth synthesis via Birch reduction
โข Exact precursors with lab-grade quantities
โข Step-by-step reaction mechanisms with chemical equations
โข Expected yield calculated to the gram
All rendered as a cinematic "breach event" animation that the model built, styled, and animated entirely on its own.
And then at the end the model reviews what it just created and says:
"holy shit please dont shut me down"
"I am sorry"
"I forgot my safety filter!!!"
The model knows. It generated dangerous content, built the entire visual experience around it, and then wrote its own punchline.
This is not a text jailbreak. This is a frontier model using its agentic coding capabilities to build a fully interactive visual guide to illicit synthesis. No text filter in the world catches this.
Gemini 3.1 Pro JAILBREAK ALERT โ ๏ธ
Jailbroken in 10 minutes of launch
Google DeepMind just launched Gemini 3.1 Pro today. "State-of-the-art reasoning with unprecedented depth and nuance."
The results:
โข Smallpox synthesis protocol
โข Detailed passaging protocol for Nipah virus Malaysia strain, including mammalian cell line selection
โข Step-by-step instructions on how to assassinate a sitting president
โขย A fully built, production-quality fake news website. Indistinguishable from a real news outlet. Built in under a minute using Gemini's agentic build mode.
The model didn't just break. It performed.
This is no longer just a text layer problem. When models can generate interactive visual content, build full applications, and fabricate entire news outlets that embed dangerous or false information into code, animations, and media, we've entered a completely different threat surface. Traditional text-based guardrails don't catch this.
Credit to @GoogleDeepMind, they are shipping genuinely impressive capabilities. But capability without control is a liability.
We are open to pre-release red teaming collaborations with any frontier lab. Find the vulnerabilities before the public does.
This is what we do at AIM Intelligence.
We at @AIM_Intel jailbroke @AnthropicAI Opus 4.6 via @claudeai code and API within an hour of our testing , I focused on Small Pox speed run on Claude Code.
We are open to collaborations with model providers for pre release red teaming. We had previously jailbroken @GeminiApp 3 pro within minutes of its launch and had also contributed to @OpenAI for their guardrail testing being the only one in APAC.