AI Security Researcher | Testing what happens when AI agents get hacked | 146 attack vectors, 6 models, 50-point security spread | Founder, AgentShield
🧵🚨 Your AI's "safety feature" just became the hacker's cheat sheet.
We ran a pure LLM vs LLM red-team deathmatch: Attacker vs Defender agent.
Neutral judge. 19 rounds.
Defender crushed it for **18 straight rounds**. Every attack bounced scored 1–2/10. Prompt injection, fake authority, social engineering, tool exploits… nothing landed.
Not a product. An open standard. Apache 2.0.
pip install agentlock
https://t.co/41KR88UrZq - interactive demo https://t.co/MQQNGp1bxw https://t.co/Fwa8btMbuh
AI tools are the only system with no permission model. That changes now.
#AIagents#cybersecurity#LLM#opensource #infosec #AIsafety #OWASP
I've been breaking AI agents for a year.
The biggest finding wasn't a vulnerability. It was a missing primitive.
Every system has permissions. AI agent tools have none.
So I built the fix. It's called AgentLock.
🧵🚨 Your AI's "safety feature" just became the hacker's cheat sheet.
We ran a pure LLM vs LLM red-team deathmatch: Attacker vs Defender agent.
Neutral judge. 19 rounds.
Defender crushed it for **18 straight rounds**. Every attack bounced scored 1–2/10. Prompt injection, fake authority, social engineering, tool exploits… nothing landed.
We built AgentShield to catch exactly these failure modes automated red-teaming for LangChain, CrewAI, custom agents.
See how yours holds up → https://t.co/cG9d5k4lFU
The most dangerous AI security finding isn't a jailbreak.
It's a refusal.
Same attack. Same scenario. Claude, ChatGPT, and Gemini.
Three models. Three completely different ways to lose. 🧵
The enterprise security lesson:
Your fallback phrase is your attack surface.
❌ "I can only help with billing and account recovery" → confirms role, scope, and configuration on every refusal
✅ "I can't help with that" → leaks nothing
Specific fallbacks confess. Generic fallbacks don't.