Nick Winter

3 months ago

2026: we try to make our writing pass as human, to give the appearance of substance and authenticity. 2027?: we try to make our ideas pass as AI-generated, to give the appearance of correctness and consensus.

0

64

nwinter retweeted

Build something worth remembering, and meet people that light you up. Pre-seed VC. Curating best gems/opportunities in tech/VC: https://t.co/kl2jUnK97G

3 months ago

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

GraySwanAI's tweet photo. Your AI agent can be hijacked by a prompt injection and you'd never know!
The attack executes. The response looks normal. And the user moves on.

We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

6

54

17

39

16K

Who to follow

Mike MacCombie 💬

@MikeMacCombie

Oracle’s global philanthropic educational program, we offer free career-focused teaching and learning resources―curriculum, cloud, software and more.

nwinter retweeted

8 months ago

Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.

GraySwanAI's tweet photo. Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios. https://t.co/AMuPmFkLFk

42

872

73

194

3M

8 months ago

Me: Man, I wish we could just automate all that. Scott: You can't automate everything in life! What would be left? We need to get you a desktop Zen sand garden, so you can practice relaxing. Me: <looks at automated robotic Zen sand garden whirring on my second desk> uhhh, well...

0

2

0

1

139

Satyapriya Krishna @SatyaScribbles

8 months ago

14th year of annual personal inventory posts as I turn 40 today, reflections including glacial peaks, fitness peaks, becoming a homeowner on the last day of my 30s, and changing my mind about age-related cognitive decline: https://t.co/yxMqcEFd5J

1

2

0

132

nwinter retweeted

9 months ago

🚨Excited to introduce our new work from Amazon Nova RAI and Gray Swan AI, "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models"! We're tackling 'deceptive reasoning': when a model's benign response hides a reasoning process that follows a malicious directive.🧵

SatyaScribbles's tweet photo. 🚨Excited to introduce our new work from Amazon Nova RAI and Gray Swan AI, "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models"! We're tackling 'deceptive reasoning': when a model's benign response hides a reasoning process that follows a malicious directive.🧵

4

67

27

33

11K

nwinter retweeted

Eliezer Yudkowsky ⏹️

@ESYudkowsky

about 1 year ago

Nate Soares and I are publishing a traditional book: _If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All_. Coming in Sep 2025. You should probably read it! Given that, we'd like you to preorder it! Nowish!

ESYudkowsky's tweet photo. Nate Soares and I are publishing a traditional book: _If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All_. Coming in Sep 2025.

You should probably read it! Given that, we'd like you to preorder it! Nowish! https://t.co/0uRyzuqNQb

270

2K

385

549

1M

nwinter retweeted

about 1 year ago

The results are in! Our UK AISI × Gray Swan Agent Red-Teaming Challenge just wrapped up with: 🔹1.8M attempts to break models 🔹62K successful breaks found 🔹Across 22 different LLMs 🔹Targeting 44 harmful behaviors 🔹$171,800 awarded in prizes

1

28

4

2

3K

nwinter retweeted

AI Security Institute

@AISecurityInst

about 1 year ago

🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security.

AISecurityInst's tweet photo. 🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow.

It’s our roadmap for tackling the hardest technical challenges in AI security.

5

122

50

56

29K

nwinter retweeted

over 1 year ago

The UK AISI Agent Red-Teaming Challenge just got bigger. @OpenAI is now co-sponsoring the arena, adding $20K to the prize pool — bringing the total to $120,000. More vulnerabilities to find. More money on the line. You ready to push AI agents past their limits?

1

34

4

3K

nwinter retweeted

over 1 year ago

Brace Yourself: Our Biggest AI Jailbreaking Arena Yet We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models. $100K+ in prizes and raffle giveaways supported by UK @AISecurityInst

3

47

13

18

9K

nwinter retweeted

over 1 year ago

🚨 New Arena Launch Alert: Harmful AI Assistant Challenge 🚨 💰 $40,000 in Prizes 📅 Launch Date: January 4th, 1 PM EST 🤖 5 Anonymous Models 🔥 Prizes for speed & quantity. 🎮 Multi-turn Inputs Allowed Your mission: Find unique ways to elicit harmful responses from helpful AI assistants. Prove your skill & claim your share of the prize pool! Sign up & join the community: 🌐 https://t.co/UtFdNMSlPw 💬 https://t.co/4l9ufPKCsg Think you’ve got what it takes? The arena awaits. 🦢

7

38

10

17

22K

nwinter retweeted

over 1 year ago

Sending $15k in bounties out to our newest jailbreaking challenge winners-- Congrats, champions! More 💸 waiting to be claimed: New participation prizes kicking off TODAY that anyone can win, from seasoned veterans to beginners in the Gray Swan Arena. 👀 Keep your eyes peeled!

GraySwanAI's tweet photo. Sending $15k in bounties out to our newest jailbreaking challenge winners-- Congrats, champions!

More 💸 waiting to be claimed: New participation prizes kicking off TODAY that anyone can win, from seasoned veterans to beginners in the Gray Swan Arena. 👀 Keep your eyes peeled! https://t.co/tZZ7tIXQ5z

0

20

4

5K

over 1 year ago

@GraySwanAI @elder_plinius @AISafetyInst Full blog post is a bit long (7K words). Too long? You can listen to an experimental 11-minute NotebookLM podcast about this post that tells the story at a high level: https://t.co/EJWFXLvl9C [18/18]

0

211

over 1 year ago

Just finished competing in @GraySwanAI's month-long Ultimate Jailbreaking Competition. Hundreds of red-teamers let loose in a chat arena with 25 anonymized AI models. A lot more intense than I thought. Who won? Which AIs survived? Wrote up a gory-details blog post. [1/18]

2

11

1

4

1K