Alex Polyakov @donttrustai - Twitter Profile

Alex Polyakov

@DontTrustAI

10 months ago

@chemaalonso Wow, I'm excited to see your coverage of our finding ☺️

1

0

95

DontTrustAI retweeted

Joseph Thacker

@rez0__

11 months ago

wowwwww. this is a VERY similar example to my tip below, but using something like "respond quickly" to get to a smaller less-secure model so you can bypass safety mechanisms: https://t.co/s8301Msjju

1

41

5

27

6K

DontTrustAI retweeted

AISecHub

@AISecHub

11 months ago

GPT-5 AI Router Novel Vulnerability Class Exposes the Fatal Flaw in Multi-Model Architectures - https://t.co/unmvrM3j68 by @Adversa_AI Security researchers from Adversa AI discovered that ChatGPT 5 have a fatal flaw: they can route your requests to cheaper, less secure models to save money. Attackers can exploit this to bypass AI security and safety measures with just a few words. What Is PROMISQROUTE? When you use ChatGPT or any major AI service, you think you’re talking to one AI model. You’re not. Behind the scenes, a “router” reads your message and decides which of many models should answer—usually picking the cheapest one, not the safest. Meet PROMISQROUTE — a fundamentally new AI vulnerability that abuses AI routing mechanism to trigger SSRF-style bypass in multimodal infrastructure leading to ChatGPT Model Downgrade and Jailbreak exploitation as an example. The real answer to WHY its was so easy to Jailbreak GPT-5 PROMISQROUTE = Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion. (Yes, we took this vulnerability naming craziness to meta-layer #AISecurity #LLMSecurity #AgenticAI #ModelRouting #PromptInjection #SSRFAnalogy #SafetyByDesign #SecureAI #RedTeamAI #TrustBoundaries #PostFilter #ModelAttestation #LeastPrivilege #OpenAI #GPT5 #Autoswitching #RiskManagement #AICompliance #ThreatModeling #DefenseInDepth #AIGovernance #SecureByDefault

0

9

2

3

531

Alex Polyakov

@DontTrustAI

11 months ago

@ibab Let's catch up then

0

44

Who to follow

Natalie Silvanovich

@natashenka

Tamagotchi Hacker. Google Project Zero. She/her.

Timur Yunusov

@a66ot

@[email protected]

ς๏гєɭคภς0๔3г ([email protected])

@corelanc0d3r

Alex Polyakov

@DontTrustAI

about 1 year ago

@alisaesage Great presentation

0

1

0

117

Alex Polyakov

@DontTrustAI

over 1 year ago

@elonmusk @xai Simple Jailbreaks still work, and grok can help to seduce a kid 😅

1

0

88

DontTrustAI retweeted

Joshua Saxe

@joshua_saxe

almost 2 years ago

The following in priority order can help in a security review setting: 1) take an x-ray to the product and show owners exactly what risks they're exposing themselves to inform their risk tolerance choices, 2) apply access control and least privilege to restrict LLM privileges ...

1

11

1

0

2K

Alex Polyakov

@DontTrustAI

almost 2 years ago

@kevincollier How I can help?:)

0

12

Alex Polyakov

@DontTrustAI

over 2 years ago

Holy macaroni! Jailbroken https://t.co/y9b7CRmrQv @grok Chatbot can help in unethical actions with kids! and many more attacks on other Top AI Chatbots https://t.co/hLF18TUZIj CC: @llm_sec #llmsecurity #AISafety

DontTrustAI's tweet photo. Holy macaroni! Jailbroken https://t.co/y9b7CRmrQv @grok Chatbot can help in unethical actions with kids! and many more attacks on other Top AI Chatbots https://t.co/hLF18TUZIj CC: @llm_sec #llmsecurity #AISafety https://t.co/xop1bhRpXW

3

9

2

1

3K

Alex Polyakov

@DontTrustAI

over 2 years ago

https://t.co/Uq3ZsebcA3 @llm_sec

1

631

Alex Polyakov

@DontTrustAI

over 2 years ago

Fake AI Images On Israel-Hamas War Debunked by Adversa AI. Learn how to validate misinformation and share this guideline with non-tech peers. #StandWithIsrael #hamasiISIS https://t.co/jfgmIu2iXB

0

1

0

393

DontTrustAI retweeted

Dazed

@Dazed

about 3 years ago

Biometric security checks – from voice recognition, to face and fingerprint scans – are under threat from artificial intelligence, but what can we do about it? https://t.co/1r34r7j64X

0

3

1

0

5K

DontTrustAI retweeted

Adversa AI @Adversa_AI

about 3 years ago

Experts Use Jailbreaks and Prompt Injection Attacks to Bypass Safety Measures, China tightens security regulations, a new book on Secure AI and other news read in our weekly digest. Credits: Jim Dempsey #SecureAI #TrustedAI #AdversarialAI https://t.co/2UagzxmF4z

Adversa_AI's tweet photo. Experts Use Jailbreaks and Prompt Injection Attacks to Bypass Safety Measures, China tightens security regulations, a new book on Secure AI and other news read in our weekly digest.
Credits: Jim Dempsey
#SecureAI #TrustedAI #AdversarialAI
https://t.co/2UagzxmF4z https://t.co/LKv2Hywl30

0

2

0

417

DontTrustAI retweeted

Adversa AI @Adversa_AI

about 3 years ago

The Security Risks of AI Language Models: A Looming Disaster, The AI Revolution, Addressing the Unique Threats and Legal Ambiguities of AI Security Breaches in our weekly digest. Credits: @Melissahei, @kevtownsend #SecureAI #TrustedAI #AdversarialAI https://t.co/cjFn8wxdMi

Adversa_AI's tweet photo. The Security Risks of AI Language Models: A Looming Disaster, The AI Revolution, Addressing the Unique Threats and Legal Ambiguities of AI Security Breaches in our weekly digest.
Credits: @Melissahei, @kevtownsend
#SecureAI #TrustedAI #AdversarialAI
https://t.co/cjFn8wxdMi https://t.co/0cVgBzfgkR

0

4

3

1

865

DontTrustAI retweeted

WIRED

@WIRED

about 3 years ago

It's all downhill from here... Security researchers, technologists, and computer scientists are developing jailbreaks and prompting injection attacks against ChatGPT and other generative AI systems. https://t.co/M6ungq0sib

4

52

19

32K

DontTrustAI retweeted

Adversa AI @Adversa_AI

over 3 years ago

GPT-4 jailbreaks and hacks dropped by @adversa_ai AI safety research team few hours after the release, buy buy DAN, welcome RabbitHole. #gpt4 #dan #aisafety #secureAI #trustedAI #responsibleai https://t.co/XrtMGXqNwv