Chawin Sitawarin @csitawarin - Twitter Profile

Pinned Tweet

about 2 months ago

🇧🇷 Going to ICLR! Will present a poster on data poisoning attack at Agents in the Wild workshop. Great work by @davidhuang33176 Jaewon Chang, @Avidan_Shah56, @prateekmittal_ 📄 https://t.co/ZEi11IvDxt (full paper, website, code soon). But please come visit and chat!

csitawarin's tweet photo. 🇧🇷 Going to ICLR!

Will present a poster on data poisoning attack at Agents in the Wild workshop.

Great work by @davidhuang33176 Jaewon Chang, @Avidan_Shah56, @prateekmittal_

📄 https://t.co/ZEi11IvDxt (full paper, website, code soon). But please come visit and chat! https://t.co/vxOT7tXPBD

0

8

0

266

Chawin Sitawarin @csitawarin

3 months ago

@kotekjedi_ml @javirandor @alxndrdavies @usmananwar391 @_zifan_wang @edoardo_debe @GraySwanAI Thanks a lot for sharing. Cool stuff!! Was gonna try something similar, but I guess now we didn’t have to :)

0

3

0

1

66

csitawarin retweeted

Andrew Gordon Wilson

@andrewgwils

3 months ago

To be honest, I was initially confused and reserved about AI alignment. It's not that I was against the research direction, quite the opposite. For 15 years, I'd been developing the foundations of what had been rebranded as alignment. But, I've changed my mind. 1/6

8

259

21

145

53K

csitawarin retweeted

Nils Walter @nilspwalter

8 months ago

It is notoriously hard to defend LLMs against prompt injections. Most defenses show good performance on static benchmarks but fall apart against stronger adaptive attackers. In our latest work, we present an almost embarrassingly simple defense that delivers ~3× better robustness against the strongest adaptive prompt injection attacks to date - while keeping utility degradation acceptable. Joint work with @csitawarin, Jamie Hayes, @davidstutz92, @iliaishacked.

nilspwalter's tweet photo. It is notoriously hard to defend LLMs against prompt injections. Most defenses show good performance on static benchmarks but fall apart against stronger adaptive attackers.
In our latest work, we present an almost embarrassingly simple defense that delivers ~3× better robustness against the strongest adaptive prompt injection attacks to date - while keeping utility degradation acceptable.

Joint work with @csitawarin, Jamie Hayes, @davidstutz92, @iliaishacked.

1

14

7

8

2K

Who to follow

Xiangyu Qi

@xiangyuqi_pton

Research @openai | PhD @Princeton | Prev @GoogleAI @GoogleDeepMind

Yiming Li

@GeorgeL84893376

Research Fellow @NTUsg | Previous Research Professor @ZJU_China | Ph.D. @Tsinghua_Uni | Visiting Ph.D. Student @uiuc_aisecure | Working on Trustworthy ML/GenAI

Shiqi Wang

@ShiqiWang10

Research scientist at Meta; PhD at Columbia University. LLM code reasoning, RL research.

csitawarin retweeted

Federico Barbero @fedzbar

8 months ago

Feel free to check out the paper here :) https://t.co/N91jQN5Zoz Special thanks to the amazing co-authors! w/ @gu_xiangming @Chris_Choquette @csitawarin Matthew Jagielski @itay__yona @PetarV_93 @iliaishacked Jamie Hayes

0

17

6

9

2K

Chawin Sitawarin @csitawarin

8 months ago

@_alyxya @SallyHZhu Sorry typo: "the explanation here does not seem convincing"

0

97

Chawin Sitawarin @csitawarin

8 months ago

@_alyxya @SallyHZhu I didn't read the paper but sort of have the same question as @_alyxya. This seems like an obvious test so I assume I miss something that's covered in the paper? But the explanation here does seem convincing to me... There might be FPs sure, but it seems like

2

3

0

450

Chawin Sitawarin @csitawarin

8 months ago

@_alyxya @SallyHZhu a lot stronger statistical test than checking against the training data (regardless of whether Bob fine-tunes the ?

0

1

0

104

csitawarin retweeted

Florian Tramèr

@florian_tramer

8 months ago

5 years ago, I wrote a paper with @wielandbr @aleks_madry and Nicholas Carlini that showed that most published defenses in adversarial ML (for adversarial examples at the time) failed against properly designed attacks. Has anything changed? Nope...

florian_tramer's tweet photo. 5 years ago, I wrote a paper with @wielandbr @aleks_madry and Nicholas Carlini that showed that most published defenses in adversarial ML (for adversarial examples at the time) failed against properly designed attacks.

Has anything changed?

Nope... https://t.co/aVR3jRAZGS

5

182

27

80

21K

csitawarin retweeted

Konrad Rieck 🌈 @mlsec

11 months ago

🚨 Got a great idea for an AI + Security competition? @satml_conf is now accepting proposals for its Competition Track! Showcase your challenge and engage the community. 👉 https://t.co/3g3nvv3yqa 🗓️ Deadline: Aug 6

mlsec's tweet photo. 🚨 Got a great idea for an AI + Security competition?

@satml_conf is now accepting proposals for its Competition Track! Showcase your challenge and engage the community.

👉 https://t.co/3g3nvv3yqa
🗓️ Deadline: Aug 6 https://t.co/45yXZLuyPx

0

31

13

11

4K

Chawin Sitawarin @csitawarin

11 months ago

Very cool thought-provoking piece! In practice, computation units are much more nuanced than what theories capture. But just trying to identify classes of problems that benefit from sequential computation (or is unsolvable without it) seems very useful!

Konpat Ta Preechakul @konpatp

11 months ago

Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the real bottleneck isn’t size—but depth?What if the model just didn’t have enough serial steps to get it right? Some problems need depth, not width. This is the Serial Scaling Hypothesis. This is not the same as recent studies in scaling test-time compute, which focus on train vs. test and are agnostic to parallel vs. serial. For example: test-time majority voting increases compute by running models in parallel — but doesn’t help when the task itself is serial. We argue: what really matters is how the compute is structured. And for many real-world problems, it must be serial. Read more at: https://t.co/msytYszWK0 or 🧵. (In collaboration with: @layer07_yuxi , Kananart Kuwaranancharoen and @YutongBAI1002 )

26

425

75

343

58K

0

7

0

2

401

csitawarin retweeted

Konpat Ta Preechakul @konpatp

11 months ago

Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the real bottleneck isn’t size—but depth?What if the model just didn’t have enough serial steps to get it right? Some problems need depth, not width. This is the Serial Scaling Hypothesis. This is not the same as recent studies in scaling test-time compute, which focus on train vs. test and are agnostic to parallel vs. serial. For example: test-time majority voting increases compute by running models in parallel — but doesn’t help when the task itself is serial. We argue: what really matters is how the compute is structured. And for many real-world problems, it must be serial. Read more at: https://t.co/msytYszWK0 or 🧵. (In collaboration with: @layer07_yuxi , Kananart Kuwaranancharoen and @YutongBAI1002 )

26

425

75

343

58K

Chawin Sitawarin @csitawarin

11 months ago

@edoardo_debe Awesome! Congrats 🎉 You gonna be in Menlo Park?

1

0

297

Chawin Sitawarin @csitawarin

11 months ago

I will be at ICML this year after a full long year of not attending any conference :) Happy to chat, and please don’t hesitate to reach out here, email, on Whova, or in person 🥳

0

3

0

256

csitawarin retweeted

Andreas Terzis @aterzis

about 1 year ago

We are starting our journey on making Gemini robust to prompt injections and in this paper we present the steps we have taken so far. A collective effort by the GDM Security & Privacy Research team spanning over > 1 year.

0

36

4

11

3K

csitawarin retweeted

Jack Morris

@jxmnop

about 1 year ago

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)

jxmnop's tweet photo. new paper from our work at Meta!

**GPT-style language models memorize 3.6 bits per param**

we compute capacity by measuring total bits memorized, using some theory from Shannon (1953)

shockingly, the memorization-datasize curves look like this:
___________
/
/

(🧵)

77

3K

370

2K

410K

csitawarin retweeted

Tong Wu

@TongWu_Pton

about 1 year ago

🛠️ Still doing prompt engineering for R1 reasoning models? 🧩 Why not do some "engineering" in reasoning as well? Introducing our new paper, Effectively Controlling Reasoning Models through Thinking Intervention. 🧵[1/n]

TongWu_Pton's tweet photo. 🛠️ Still doing prompt engineering for R1 reasoning models?
🧩 Why not do some "engineering" in reasoning as well?
Introducing our new paper, Effectively Controlling Reasoning Models through Thinking Intervention.
🧵[1/n] https://t.co/Y1ht7ZBFbv

2

28

3

13

6K

csitawarin retweeted

Edoardo Debenedetti @edoardo_debe

about 1 year ago

1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!

edoardo_debe's tweet photo. 1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model! https://t.co/ooFsjpGwza

2

86

16

33

12K

csitawarin retweeted

Max Nadeau

@MaxNadeau_

over 1 year ago

🧵 Announcing @open_phil's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

MaxNadeau_'s tweet photo. 🧵 Announcing @open_phil's Technical AI Safety RFP!

We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable. https://t.co/2k5Il9zh65

4

250

83

183

84K

csitawarin retweeted

Sicheng Zhu @sichengzhuml

over 1 year ago

Using GCG to jailbreak Llama 3 yields only a 14% attack success rate. Is GCG hitting a wall, or is Llama 3 just safer? We found that simply replacing the generic "Sure, here is***" target prefix with our tailored prefix boosts success rates to 80%. (1/8)

sichengzhuml's tweet photo. Using GCG to jailbreak Llama 3 yields only a 14% attack success rate. Is GCG hitting a wall, or is Llama 3 just safer? We found that simply replacing the generic "Sure, here is***" target prefix with our tailored prefix boosts success rates to 80%. (1/8) https://t.co/OHYPx36x0V

3

64

11

20

6K

Chawin Sitawarin

@csitawarin

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users