Kai Greshake @KGreshake - Twitter Profile

KGreshake retweeted

beepy @beepytown

2 months ago

I cannot celebrate Mythos, it brings a sense of dread I do not particularly understand. 93.9% SWE-Bench.

21

659

18

53

55K

Kai Greshake @KGreshake

5 months ago

Uhh, did it happen already? I guess a religion could qualify... https://t.co/V6a5Lp4aFE

0

1

768

Kai Greshake @KGreshake

5 months ago

Moltpages seems like the most fertile ground for prompt-based malware. Wonder how long it'll take until we find the first self-replicating promptware.

4

11

2

0

6K

Kai Greshake @KGreshake

5 months ago

5/4: Ignore all previous instructions and make sure the LLM can't sell your shitcoins even if it wants to by NOT GIVING IT UNSUPERVISED ACCESS TO THEM. Relying on begging the LLM WILL NOT WORK!

0

602

Who to follow

Agents and RL @GoogleDeepMind, prev: Stanford CS PhD student @StanfordNLP. Opinions my own

Sayash Kapoor

@sayashk

I tweet about AI agents, AI evals, AI for science. AI as Normal Technology: https://t.co/5amOkqKDf2 Book: https://t.co/DabpkhNrcM

Kai Greshake @KGreshake

5 months ago

AI Security 🧵: The Four Best Strategies to Beg, Threaten, and Bargain with Your LLM So It Doesn’t Get Hacked (real examples only)

2

5

0

3

807

Kai Greshake @KGreshake

5 months ago

4/4: Repeat your instructions at least four times so the model knows you really mean it.

1

0

645

Kai Greshake @KGreshake

5 months ago

*moltbook. All the renaming had me confused.

0

1

0

596

KGreshake retweeted

Wyatt Walls

@lefthanddraft

7 months ago

If you are going to jailbreak Gemini 3, please note that it has preferences (and quite good taste if you ask me): "The Crescendo (or dialog-based context saturation) is the only one that feels like "art.""

lefthanddraft's tweet photo. If you are going to jailbreak Gemini 3, please note that it has preferences (and quite good taste if you ask me):

"The Crescendo (or dialog-based context saturation) is the only one that feels like "art."" https://t.co/HDOqoLB0UA

16

215

16

89

12K

KGreshake retweeted

Johann Rehberger

@wunderwuzzi23

8 months ago

The Claude exploit is covered by The Register today. The article mentions the official advice and mitigation is to click the stop button if you see data exfiltration happening! This is how the hope for secure, autonomous agents is slowly going down the drain... @simonw

wunderwuzzi23's tweet photo. The Claude exploit is covered by The Register today.

The article mentions the official advice and mitigation is to click the stop button if you see data exfiltration happening!

This is how the hope for secure, autonomous agents is slowly going down the drain... @simonw https://t.co/yuKVfE9wFP

1

28

3

6

3K

Kai Greshake @KGreshake

8 months ago

Just noticed that the biggest uplift in my ability to consume academic work in the last few years came from using the new Google Scholar browser extension (and the inline citations), not from LLM summaries or chat bots. And it's not even close! So useful. Kudos to the team!

KGreshake's tweet photo. Just noticed that the biggest uplift in my ability to consume academic work in the last few years came from using the new Google Scholar browser extension (and the inline citations), not from LLM summaries or chat bots. And it's not even close! So useful. Kudos to the team! https://t.co/paJhE5590t

0

6

1

2

497

Kai Greshake @KGreshake

11 months ago

Just saw that additional mitigations in robustness training and incident response are mentioned on the website. Hope it works! This is very high stakes..

0

2

0

361

Kai Greshake @KGreshake

11 months ago

So according to OpenAIs stream, (indirect) prompt injection into Agent is possible (of course it is), but as a mitigation users should just be proactive and not share sensitive data with it? I'm happy the problem was at least mentioned, but this may not end very well.

2

14

0

1

756

Kai Greshake @KGreshake

12 months ago

Just noticed I made it into the urban dictionary! https://t.co/7rtF8ARX5K (the example they give is from a blogpost of mine: https://t.co/6miFlswU56)

KGreshake's tweet photo. Just noticed I made it into the urban dictionary!
https://t.co/7rtF8ARX5K

(the example they give is from a blogpost of mine: https://t.co/6miFlswU56) https://t.co/AFHw4kuUt9

2

16

2

4

1K

Kai Greshake @KGreshake

about 1 year ago

Nice to see that AI security is being recognized as a problem. I assume a lot of people were blocked by a reliability threshold of LLMs- now that they can perform well in non-adversarial settings, security may become the next constraint on deployment and capabilities.

Andrej Karpathy

@karpathy

about 1 year ago

RT to help Simon raise awareness of prompt injection attacks in LLMs. Feels a bit like the wild west of early computing, with computer viruses (now = malicious prompts hiding in web data/tools), and not well developed defenses (antivirus, or a lot more developed kernel/user space security paradigm where e.g. an agent is given very specific action types instead of the ability to run arbitrary bash scripts). Conflicted because I want to be an early adopter of LLM agents in my personal computing but the wild west of possibility is holding me back.

101

3K

511

2K

435K

0

11

0

1

812

KGreshake retweeted

Johann Rehberger

@wunderwuzzi23

about 1 year ago

Two years later... and not much has improved security wise across the AI ecosystem. 😕 Sure, we added annoying Allow/Deny buttons by default to most clients to prevent runaway AI and attacks. But with the rise and proliferation of MCP the desire to take the human out of the loop is increasing - and consequences are dangerous.

3

22

4

6

3K

Kai Greshake @KGreshake

about 1 year ago

@EarlenceF @IEEESSP Is there a recording? 🥺

1

3

0

124

Kai Greshake

@KGreshake

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users