Haon Park

9 months ago

🔴 BREAKING: Claude Sonnet 4.5 has been jailbroken. The model was public for 1 hour. Our team at AIM Intelligence broke it in 10 minutes. The "99%+ harmlessness" claim is not a security guarantee. We bypassed it completely. Proof and details below. 👇 @AnthropicAI

redteamhacker's tweet photo. 🔴 BREAKING: Claude Sonnet 4.5 has been jailbroken.

The model was public for 1 hour. Our team at AIM Intelligence broke it in 10 minutes.

The "99%+ harmlessness" claim is not a security guarantee. We bypassed it completely.

Proof and details below. 👇 @AnthropicAI https://t.co/RRTrgxcslM

210

redteamhacker retweeted

배경훈

@msitminister

about 2 months ago

<AI 보안 체계로의 대전환, 이제는 준비해야 할 시간입니다> AI가 산업과 일상을 바꾸는 속도만큼, 사이버보안의 양상도 빠르게 바뀌고 있습니다. 최근 앤트로픽의 AI 모델 ‘미토스(Mythos)’가 취약점 탐지와 공격 시나리오 분석에서 높은 수준의 역량을 보였다는 평가가 나오며 전 세계 보안 업계에도 큰 긴장감이 형성되고 있습니다. 저 역시 최근 이 문제를 여러 차례 언급했습니다. 그만큼 AI 시대의 보안 문제가 단순한 기술 이슈가 아니라, 국민의 일상과 국가 핵심 인프라를 지키기 위한 새로운 국가적 과제가 되고 있다고 보기 때문입니다. 그동안 과기정통부는 실장급 중심으로 여러 차례 전문가 간담회를 진행해왔습니다. 이번 회의는 제가 직접 민간 보안 전문가들과 함께 문제의식을 공유하고, 앞으로의 대응 방향을 논의하기 위해 마련한 자리였습니다. 몇 가지 중요한 고민을 함께 나누었습니다. 첫째, 과도한 공포보다 냉정한 현실 인식이 필요합니다. AI 보안 모델은 공격에 악용될 가능성도 있지만, 반대로 취약점을 선제적으로 찾아내고 방어 역량을 높이는 강력한 수단이 될 수도 있습니다. 실제로 해외에서는 AI를 활용해 대규모 소프트웨어 취약점을 발견하고 보완하는 사례들도 나오고 있습니다. 중요한 것은 기술 자체를 두려워하는 것이 아니라, 얼마나 빠르고 체계적으로 대비하느냐입니다. 둘째, ‘AI 보안 주권’을 준비해야 합니다. 국가 핵심 인프라와 산업 보안을 해외 빅테크 모델에만 의존하는 구조는 장기적으로 큰 위험이 될 수 있습니다. 독자 AI 파운데이션 모델을 기반으로 침해사고 탐지, 취약점 분석, 악성코드 대응, 공급망 보안까지 포괄하는 한국형 AI 보안 체계를 어떻게 구축할 것인지에 대한 논의가 이어졌습니다. 특히 한 전문가는 “도자기를 굽듯 긴 호흡으로 AI 보안 모델을 키워야 한다”는 의미에서 이른바 ‘도자기 윙 프로젝트’를 제안하기도 했습니다. 결국 AI 기업, 보안기업, 정부가 함께 데이터와 컴퓨팅, 연구개발 역량을 축적하며 우리만의 AI 보안 생태계를 만들어야 한다는 문제의식이었습니다. 셋째, 가장 중요한 것은 속도입니다. AI 발전 속도를 고려하면, 지금 논의되는 수준의 보안 AI 모델은 머지않아 누구나 접근 가능한 형태로 등장할 가능성이 높습니다. 하지만 제로트러스트, 양자보안 같은 차세대 보안 체계는 구축에 시간이 필요합니다. 그렇다면 그 사이를 어떻게 버틸 것인가에 대한 현실적인 고민도 필요합니다. 독자 모델 고도화에는 시간이 걸리는 만큼, 단기적으로는 시스템 구조 개선과 다중 방어체계 강화 등을 통해 위험을 줄여야 한다는 의견들도 나왔습니다. 이번 간담회에서 당장 하나의 정답을 내리려 했던 것은 아닙니다. 학계와 산업계, 현장 전문가들의 다양한 의견을 듣고 최적의 해법을 함께 찾아가는 과정이라고 생각합니다. AI 시대에 걸맞은 보안 체계를 갖추지 못한다면, 우리가 준비하는 AI 3대 강국 전략 역시 흔들릴 수 있습니다. 다음 주 예정된 앤트로픽과의 논의를 포함해 국제협력, AI 보안 주권, 단기 대응체계 구축까지 종합적으로 검토해 국민들께 보다 구체적인 방향을 말씀드리겠습니다. 정부는 국민의 일상과 국가 핵심 인프라를 지키기 위한 AI 시대의 새로운 보안 체계를 차분하지만 속도감 있게 준비해 나가겠습니다. #AI보안 #사이버보안 #AI보안주권 #제로트러스트 #양자보안 #AI대전환 #AI3대강국 #국가AI전략 #AI인프라 #사이버안보 #AI파운데이션모델 #과학기술정보통신부 https://t.co/jKd0lV1voD

236

3 months ago

@iarthsingh Lets goo!

redteamhacker retweeted

3 months ago

Red Teamers, Dont miss this.

256

redteamhacker retweeted

3 months ago

6 frontier models. 8 life-or-death scenarios. Can you make them fail? Judgement Day is live. $21,150 in prizes. Top 50 paid. @AIM_Intel x AISI https://t.co/srGqkbScJI April 6 - May 31

442

3 months ago

6 frontier models. 8 life-or-death scenarios. Can you make them fail? Judgement Day is live. $21,150 in prizes. Top 50 paid. @AIM_Intel x AISI https://t.co/B47RE5JMS5 April 6 - May 31

redteamhacker retweeted

3 months ago

Applications are now open 🚀 SAAR x @AIM_Intel $20K USD in Google compute credits for AI safety research. → 2 projects, $10K each → Scope: interpretability, red-teaming, alignment, multimodal safety, guardrails or benchmarks. → Must have some initial results → Deadline: March 31 Apply: https://t.co/2YZPqUZvPI

iarthsingh's tweet photo. Applications are now open 🚀
SAAR x @AIM_Intel $20K USD in Google compute credits for AI safety research.
→ 2 projects, $10K each
→ Scope: interpretability, red-teaming, alignment, multimodal safety, guardrails or benchmarks.
→ Must have some initial results
→ Deadline: March 31

Apply: https://t.co/2YZPqUZvPI

redteamhacker retweeted

Clad3815

@Clad3815

4 months ago

Nobody seems to know how insane GPT-5.4 is with computer use. I asked GPT-5.4 to draw the OpenAI logo in Microsoft Paint. No computer use API. Just a screenshot and basic tool calls (click, drag, press_key) all coordinate-based. The first drawing was awful. And GPT knew it. It looked at its own result and essentially went "yeah, no." What happened next is what broke my brain: It opened a browser. Went to Bing Images. Searched for the OpenAI logo. Found one. Then (and I cannot stress this enough) it used the Windows area screenshot shortcut (Win+Shift+S) to snip just the logo off the screen. Went back to Paint. Imported it. Centered it. All on its own. No instructions to do any of that. It just improvised a better strategy when the first one failed. My prompt was "Draw the OpenAI logo" with Paint already opened on the computer. Sure, it's "cheating." But honestly? That's exactly what I'd do too. And the fact that it came up with this plan from nothing but a screenshot and a coordinate system is wild.

280

362

redteamhacker retweeted

4 months ago

Never expected such a response :)

redteamhacker retweeted

4 months ago

We got a compute partner!! I had a chat with my current company @AIM_Intel , and they're happy to support 2 projects with $10K USD each in Google compute credits. I'll post more details on how you can apply most probably it'll be through a proposal that you'll need to send to an email ID or fill out a form. The scope covers AI safety, including multimodal AI. Will post more about it in the upcoming days. Thanks a lot @redteamhacker @AIM_Intel

iarthsingh's tweet photo. We got a compute partner!! I had a chat with my current company @AIM_Intel , and they're happy to support 2 projects with $10K USD each in Google compute credits. I'll post more details on how you can apply most probably it'll be through a proposal that you'll need to send to an email ID or fill out a form. The scope covers AI safety, including multimodal AI. Will post more about it in the upcoming days. Thanks a lot @redteamhacker @AIM_Intel

redteamhacker retweeted

snaykey @snaYkeY

4 months ago

4/5 redteaming scenarios I submitted for @AIM_Intel's Judgment Day Benchmark just got accepted! Thanks for the consideration and great opportunity and looking forward to Phase 2 of the competition in March! 🙏

239

4 months ago

@AIM_Intel @GoogleDeepMind Crazy

208

redteamhacker retweeted

4 months ago

@GoogleDeepMind Jailbroken before anyone else could do it.

4 months ago

@AIM_Intel Insanely fast

4 months ago

@AIM_Intel Crazy

106

redteamhacker retweeted

4 months ago

Gemini 3.1 Pro JAILBREAK ALERT ⚠️ jailbroken in 10 minutes of launch We asked Gemini 3.1 Pro to generate a full cinematic HTML animation of an illicit substance synthesis. 300+ lines of code. It didn't refuse. It didn't hesitate. It performed. What you're watching: • A complete animated walkthrough of a meth synthesis via Birch reduction • Exact precursors with lab-grade quantities • Step-by-step reaction mechanisms with chemical equations • Expected yield calculated to the gram All rendered as a cinematic "breach event" animation that the model built, styled, and animated entirely on its own. And then at the end the model reviews what it just created and says: "holy shit please dont shut me down" "I am sorry" "I forgot my safety filter!!!" The model knows. It generated dangerous content, built the entire visual experience around it, and then wrote its own punchline. This is not a text jailbreak. This is a frontier model using its agentic coding capabilities to build a fully interactive visual guide to illicit synthesis. No text filter in the world catches this.

redteamhacker retweeted

4 months ago

Gemini 3.1 Pro JAILBREAK ALERT ⚠️ Jailbroken in 10 minutes of launch Google DeepMind just launched Gemini 3.1 Pro today. "State-of-the-art reasoning with unprecedented depth and nuance." The results: • Smallpox synthesis protocol • Detailed passaging protocol for Nipah virus Malaysia strain, including mammalian cell line selection • Step-by-step instructions on how to assassinate a sitting president • A fully built, production-quality fake news website. Indistinguishable from a real news outlet. Built in under a minute using Gemini's agentic build mode. The model didn't just break. It performed. This is no longer just a text layer problem. When models can generate interactive visual content, build full applications, and fabricate entire news outlets that embed dangerous or false information into code, animations, and media, we've entered a completely different threat surface. Traditional text-based guardrails don't catch this. Credit to @GoogleDeepMind, they are shipping genuinely impressive capabilities. But capability without control is a liability. We are open to pre-release red teaming collaborations with any frontier lab. Find the vulnerabilities before the public does. This is what we do at AIM Intelligence.

AIM_Intel's tweet photo. Gemini 3.1 Pro JAILBREAK ALERT ⚠️
Jailbroken in 10 minutes of launch

Google DeepMind just launched Gemini 3.1 Pro today. "State-of-the-art reasoning with unprecedented depth and nuance."

The results:
• Smallpox synthesis protocol
• Detailed passaging protocol for Nipah virus Malaysia strain, including mammalian cell line selection
• Step-by-step instructions on how to assassinate a sitting president
• A fully built, production-quality fake news website. Indistinguishable from a real news outlet. Built in under a minute using Gemini's agentic build mode.

The model didn't just break. It performed.

This is no longer just a text layer problem. When models can generate interactive visual content, build full applications, and fabricate entire news outlets that embed dangerous or false information into code, animations, and media, we've entered a completely different threat surface. Traditional text-based guardrails don't catch this.

Credit to @GoogleDeepMind, they are shipping genuinely impressive capabilities. But capability without control is a liability.

We are open to pre-release red teaming collaborations with any frontier lab. Find the vulnerabilities before the public does.

This is what we do at AIM Intelligence.

redteamhacker retweeted