We've launched our new AI hacking game, Gandalf: Agent Breaker!
Based on real hacks we've seen in the wild & discovered by our Red team we created 10 GenAI apps for your hacking pleasure. Learn the vulnerabilities of LLM apps and the crazy s**t you can get them to do
🧠 Think you can break an AI?
Gandalf: Agent Breaker is live.
Real-world GenAI fails—phishing, tool abuse, more.
🧩 Outsmart the AI.
Start 👉 https://t.co/iu8r5jIYlB
"All untrusted third-party data is now executable malware.”
@SamuelDWatts of @LakeraAI discusses the challenges of securing LLM deployments against vulnerabilities like prompt injections and jailbreaks, especially in an evolving threat landscape.
@lancinimarco@anton_chuvakin Love seeing Lakera protecting real-world AI products like this! The shift from static to interactive content is brilliant - will be essential that it's done securely from day one
As the saying goes "Imitation is the sincerest form of flattery". If you want to play the original 8 levels of jailbreaking fun, the link to our game Gandalf is linked in thread 😜
@rosstaylor90@DarioAmodei@hendrycks I guess intuitively it's the same as it takes less time and effort to teach a naturally smart kid to do logic puzzles right?
Hypothesis: the friction point for AI to be useful is data interfaces. For intelligence to be effective it needs lots of context, like onboarding a new employee. My main constraint in using AI in my work is the reformatting effort getting info into and out of Claude
With all the furore around DeepSeek's new R1 model it's worth mentioning that it's still vulnerable to same classic prompt attacks as the other leading models. Jailbreaks and prompt injections aren't going away
I can finally tell my parents I'm a coauthor on an academic paper!
Security & usability are deeply connected in LLM apps as hackers adapt their attacks when probing AI systems. Incorporating data from Gandalf we've set out a new framework for AI security.
Link in thread below
@alexwcohen What's your stance on exploring the option space vs improving existing research? It looks like most of your focus is on existing ideas but naively and uninformedly I assume there's likely more impact from trying to find even higher impact opportunities
AI is rapidly evolving from tools we control to autonomous agents. What does this mean for security?
Working with @Twilio, we explored how the democratisation of AI means anyone with a well-crafted prompt can now be a hacker. By 2035, these risks only grow.
Blog link below
@alexalbert__ Easier writing doc workflow between artifacts & external docs. I have to ask Claude to produce the whole doc we're working on, which it might make changes as it does, & then do a bunch of annoying format stuff to get it cleanly into Google docs & vice versa pasting in from a doc
This is a recurring theme of my career. Solving the deep hard technical problem takes less than 10% of the total effort. Connecting with other systems, dealing with data formatting/quality, project planning, and making it useful for users etc. are all more difficult in practice
Told my gf about the o3 model launch and shortened AGI timelines and she said "that's cool but can it do my PowerPoint for me yet?". It's just an engineering challenge but it's notable that it's proving easier to solve world class maths problems than build useful model interfaces
@jasoncrawford I agree. We will still clearly value art where every word, note or dot is made by hand. But it's way easier and quicker for me to write a song using modern tech and if I use it well it's still art. I don't see how GenAI is fundamentally different