Sam Watts

@SamuelDWatts

Product @LakeraAI. AI safety, tech, data, maths, money, trivia. Not necessarily in that order.

London, UK

Joined June 2009

1.2K Following

506 Followers

6.5K Posts

Sam Watts @SamuelDWatts

9 months ago

We've launched our new AI hacking game, Gandalf: Agent Breaker! Based on real hacks we've seen in the wild & discovered by our Red team we created 10 GenAI apps for your hacking pleasure. Learn the vulnerabilities of LLM apps and the crazy s**t you can get them to do

Lakera AI @LakeraAI

9 months ago

🧠 Think you can break an AI? Gandalf: Agent Breaker is live. Real-world GenAI fails—phishing, tool abuse, more. 🧩 Outsmart the AI. Start 👉 https://t.co/iu8r5jIYlB

11

21

6

4

2K

0

6

0

1

206

SamuelDWatts retweeted

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

9 months ago

NEW GANDALF LEVELS JUST DROPPED LFG!! 🧙‍♂️🎉🍻

3

69

7

17

13K

SamuelDWatts retweeted

about 1 year ago

"All untrusted third-party data is now executable malware.” @SamuelDWatts of @LakeraAI discusses the challenges of securing LLM deployments against vulnerabilities like prompt injections and jailbreaks, especially in an evolving threat landscape.

29

1K

177

673

471K

SamuelDWatts retweeted

Ayman Ali @ayman__ali13

11 months ago

Hosting a security-themed demo night with @_ai_collective and @EarlybirdVC on the 23rd of July in London featuring @LakeraAI @HarryWetherald @AISecurityInst. Engineers from @cohere @Synthesia @windsurf_ai @instadeepai @Meta already have signed up 👀 https://t.co/BK2YWxBthx

1

22

4

2

5K

Who to follow

Megan Reynolds 👩‍💻

first check infra VC / https://t.co/URmyndIemd founder

Verified account

Co-Founder and CEO of Jito Labs. @jtx_trade && @jito_labs && @jito_sol

@elspeth_lawson

Partner @join_ef & bicycle enthusiast

Sam Watts @SamuelDWatts

about 1 year ago

@lancinimarco @anton_chuvakin Love seeing Lakera protecting real-world AI products like this! The shift from static to interactive content is brilliant - will be essential that it's done securely from day one

0

1

0

0

25

Sam Watts @SamuelDWatts

over 1 year ago

https://t.co/BtFkzYVJMk

0

0

0

1

75

Sam Watts @SamuelDWatts

over 1 year ago

As the saying goes "Imitation is the sincerest form of flattery". If you want to play the original 8 levels of jailbreaking fun, the link to our game Gandalf is linked in thread 😜

over 1 year ago

We challenge you to break our new jailbreaking defense! There are 8 levels. Can you find a single jailbreak to beat them all? https://t.co/9y0fIT79pN

366

4K

252

2K

1M

1

1

0

0

208

Sam Watts @SamuelDWatts

over 1 year ago

@rosstaylor90 @DarioAmodei @hendrycks I guess intuitively it's the same as it takes less time and effort to teach a naturally smart kid to do logic puzzles right?

0

1

0

0

143

Sam Watts @SamuelDWatts

over 1 year ago

Hypothesis: the friction point for AI to be useful is data interfaces. For intelligence to be effective it needs lots of context, like onboarding a new employee. My main constraint in using AI in my work is the reformatting effort getting info into and out of Claude

0

0

0

0

75

Sam Watts @SamuelDWatts

over 1 year ago

With all the furore around DeepSeek's new R1 model it's worth mentioning that it's still vulnerable to same classic prompt attacks as the other leading models. Jailbreaks and prompt injections aren't going away

SamuelDWatts's tweet photo. With all the furore around DeepSeek's new R1 model it's worth mentioning that it's still vulnerable to same classic prompt attacks as the other leading models. Jailbreaks and prompt injections aren't going away https://t.co/CVAknYLwNc

0

0

0

0

164

Sam Watts @SamuelDWatts

over 1 year ago

https://t.co/c3atFyiHxF

0

1

0

0

36

Sam Watts @SamuelDWatts

over 1 year ago

I can finally tell my parents I'm a coauthor on an academic paper! Security & usability are deeply connected in LLM apps as hackers adapt their attacks when probing AI systems. Incorporating data from Gandalf we've set out a new framework for AI security. Link in thread below

1

2

0

0

88

Sam Watts @SamuelDWatts

over 1 year ago

@alexwcohen Thanks, that's helpful! Glad to hear

0

1

0

0

8

Sam Watts @SamuelDWatts

over 1 year ago

@alexwcohen What's your stance on exploring the option space vs improving existing research? It looks like most of your focus is on existing ideas but naively and uninformedly I assume there's likely more impact from trying to find even higher impact opportunities

1

1

0

0

42

Sam Watts @SamuelDWatts

over 1 year ago

https://t.co/vx2qgKM65x

0

0

0

0

34

Sam Watts @SamuelDWatts

over 1 year ago

AI is rapidly evolving from tools we control to autonomous agents. What does this mean for security? Working with @Twilio, we explored how the democratisation of AI means anyone with a well-crafted prompt can now be a hacker. By 2035, these risks only grow. Blog link below

1

0

0

0

54

Sam Watts @SamuelDWatts

over 1 year ago

@alexalbert__ Easier writing doc workflow between artifacts & external docs. I have to ask Claude to produce the whole doc we're working on, which it might make changes as it does, & then do a bunch of annoying format stuff to get it cleanly into Google docs & vice versa pasting in from a doc

0

0

0

0

25

Sam Watts @SamuelDWatts

over 1 year ago

This is a recurring theme of my career. Solving the deep hard technical problem takes less than 10% of the total effort. Connecting with other systems, dealing with data formatting/quality, project planning, and making it useful for users etc. are all more difficult in practice

0

0

0

0

35

Sam Watts @SamuelDWatts

over 1 year ago

Told my gf about the o3 model launch and shortened AGI timelines and she said "that's cool but can it do my PowerPoint for me yet?". It's just an engineering challenge but it's notable that it's proving easier to solve world class maths problems than build useful model interfaces

1

1

0

0

99

Sam Watts @SamuelDWatts

over 1 year ago

@jasoncrawford I agree. We will still clearly value art where every word, note or dot is made by hand. But it's way easier and quicker for me to write a song using modern tech and if I use it well it's still art. I don't see how GenAI is fundamentally different

0

1

0

0

7

Last Seen Users on Sotwe

Trends for you

Most Popular Users