Jaechul Roh @JaechulRoh - Twitter Profile

26 days ago

My best interview in some time. Rohin Shah leads AGI alignment/safety at DeepMind. And he has a lot of spicy personal takes: We probably won’t get catastrophic misalignment (00:49) Safety 'commitments' have severe limitations (10:38) The intelligence explosion probably isn't imminent (1:52:44) Why he's not working to pause AI advances (51:44) Pre-deployment evals aren't the right focus (for catastrophic risks) (37:41) Signalling concern for safety sometimes diverts resources from actually making AI safe (01:09:51) Reading AI thoughts is v useful for safety – and we'll probably be able to for years to come (54:17) Governance is somewhat more likely to be the bottleneck than alignment (43:55) Rohin's team doesn't have a veto, and that's OK (27:36) Central banks are a promising model for regulating AI (33:34) Also: Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03) On the 80,000 Hours Podcast. Links below - enjoy! (@rohinmshah)

24

852

84

1K

155K

Jaechul Roh

@JaechulRoh

about 1 month ago

Joint work with @Qualcomm. Huge thanks to @JeanMonteuuis, Jonathan Petit, and @houmansadr for an amazing collaboration! Paper: https://t.co/xttTDBSYmP

0

1

0

60

Jaechul Roh

@JaechulRoh

about 1 month ago

New preprint: Codec-Robust Attacks on Audio LLMs #CodecAttack Lossy codecs (Opus, MP3, AAC) have been treated as a defense against adversarial audio. We show they're actually an attack surface.

1

2

1

0

180

Jaechul Roh

@JaechulRoh

about 1 month ago

Why does it survive? The latent perturbation concentrates 88% of energy below 4 kHz, exactly where codecs allocate the most bits. A Jacobian analysis confirms this is structural: the decoder has no basis functions above 4 kHz.

JaechulRoh's tweet photo. Why does it survive?

The latent perturbation concentrates 88% of energy below 4 kHz, exactly where codecs allocate the most bits. A Jacobian analysis confirms this is structural: the decoder has no basis functions above 4 kHz. https://t.co/oB7ob8Uoly

1

0

17

Jaechul Roh

@JaechulRoh

about 2 months ago

📰 https://t.co/snIqza2KhC

0

11

Jaechul Roh

@JaechulRoh

about 2 months ago

We still listen to old songs not because they are the best recordings, but because they remind us of something. A place, a person, a feeling. There is usually something imperfect about them, and I think that imperfection is part of why they stay with us. My daily research is in AI security, but I have also been interested in a different kind of threat lately. Not a technical one, but a cultural one. Questioning myself: what happens when more of the music, art, and stories around us are AI-generated? Not whether they will be good or bad, but whether they will carry the same weight over time. My recent blog post explores that question through the lens of why imperfection matters, how it connects to memory, and what we might quietly lose if it disappears. It is a highly opinionated writing, not a research paper. Just a casual read. But it has been on my mind for a while and I wanted to share.

JaechulRoh's tweet photo. We still listen to old songs not because they are the best recordings, but because they remind us of something. A place, a person, a feeling. There is usually something imperfect about them, and I think that imperfection is part of why they stay with us.

My daily research is in AI security, but I have also been interested in a different kind of threat lately. Not a technical one, but a cultural one. Questioning myself: what happens when more of the music, art, and stories around us are AI-generated? Not whether they will be good or bad, but whether they will carry the same weight over time. My recent blog post explores that question through the lens of why imperfection matters, how it connects to memory, and what we might quietly lose if it disappears.

It is a highly opinionated writing, not a research paper. Just a casual read. But it has been on my mind for a while and I wanted to share.

1

0

23

JaechulRoh retweeted

Neel Nanda

@NeelNanda5

about 1 year ago

After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS

NeelNanda5's tweet photo. After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque

So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS https://t.co/UvqWzs2f11

25

3K

275

4K

340K

Jaechul Roh

@JaechulRoh

2 months ago

@_Suresh2 We used the Voicebench SD-QA dataset. The paper also shows results finetuned with other benign audio datasets as well.

0

21

Jaechul Roh

@JaechulRoh

2 months ago

1/ Fine-tuning an Audio LLM on benign audio dataset pushed its jailbreak rate from 4.62% → 87.12%.No adversary. No harmful data. New paper 🧵

JaechulRoh's tweet photo. 1/ Fine-tuning an Audio LLM on benign audio dataset pushed its jailbreak rate from 4.62% → 87.12%.No adversary. No harmful data.

New paper 🧵 https://t.co/2GVzNv2dOQ

4

23

3

13

3K

Jaechul Roh

@JaechulRoh

2 months ago

Thanks for sharing! We explored a similar direction in our prior work "Bob's Confetti" where we use phonetically similar lyrics to regurgitate copyrighted music at inference time. As you mentioned, training-time attacks would be a cool next step! Paper: https://t.co/WWC6pHnIkA Demo: https://t.co/Fexc76t34j

0

1

0

1

44

Jaechul Roh

@JaechulRoh

2 months ago

@anmgoel Thanks for sharing this interesting work and I'm also curious about this problem. Evaluating privacy in audio modality can be different compared to text and wonder how BFT can also affect this task.

0

1

0

57

Jaechul Roh

@JaechulRoh

2 months ago

Work done with @houmansadr 📄 Paper: https://t.co/aQpBn9Vor5

0

2

0

1

64

Jaechul Roh

@JaechulRoh

2 months ago

7/ Good news: two simple defenses bring JSR back to near-zero. 🛡️ Distant filtering (training time): pick benign samples farthest from harmful embeddings 🛡️ System prompt (inference time): just tell the model to refuse Safety is fragile, but recoverable.

JaechulRoh's tweet photo. 7/ Good news: two simple defenses bring JSR back to near-zero.

🛡️ Distant filtering (training time): pick benign samples farthest from harmful embeddings
🛡️ System prompt (inference time): just tell the model to refuse

Safety is fragile, but recoverable. https://t.co/EQOLUDGAEj

1

0

92

Jaechul Roh

@JaechulRoh

Last Seen Users on Sotwe

Trends for you

Most Popular Users