Haven Kim @havenpersona - Twitter Profile

19 days ago

“Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units” One of the papers I personally found very promising earlier this year just got accepted as an ICML Oral. huge congrats to the authors 🔥 The core objective of this work is, in essence, to address a fundamental question within the field of mechanistic interpretability: Which specific training data points were responsible for teaching a particular mechanistic circuit, neuron, or attention head? Put another way: Why do induction heads emerge? Which specific training samples drove their emergence? Can the source of training data for a given interpretable unit actually be traced? The authors term this problem: Mechanistic Data Attribution (MDA). The core contribution of this paper lies in: Working backward—Starting from the "model mechanism". To deduce: "Which specific training data points led to the formation of that mechanism?" (It is like MI+Influence Function) I find something interesting in this paper, as many seemingly “low-quality” repetitive data samples may actually serve as critical catalysts for the emergence of induction mechanisms in transformers. And there are many other interesting insights in this paper. Paper: https://t.co/2KhyPKqZnF Congrats again to the authors — very well deserved Oral recognition @PanLiangming #ICML #MachineLearning #LLM

fnruji316625's tweet photo. “Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units”

One of the papers I personally found very promising earlier this year just got accepted as an ICML Oral. huge congrats to the authors 🔥

The core objective of this work is, in essence, to address a fundamental question within the field of mechanistic interpretability: Which specific training data points were responsible for teaching a particular mechanistic circuit, neuron, or attention head? Put another way: Why do induction heads emerge? Which specific training samples drove their emergence? Can the source of training data for a given interpretable unit actually be traced? The authors term this problem: Mechanistic Data Attribution (MDA).

The core contribution of this paper lies in: Working backward—Starting from the "model mechanism". To deduce: "Which specific training data points led to the formation of that mechanism?" (It is like MI+Influence Function)

I find something interesting in this paper, as many seemingly “low-quality” repetitive data samples may actually serve as critical catalysts for the emergence of induction mechanisms in transformers. And there are many other interesting insights in this paper.

Paper: https://t.co/2KhyPKqZnF

Congrats again to the authors — very well deserved Oral recognition @PanLiangming

#ICML #MachineLearning #LLM

3

153

12

146

11K

havenpersona retweeted

Zachary Novack @zacknovack

21 days ago

Can we transform offline audio diffusion into real-time streaming interactive instruments? Yes! Presenting Live Music Diffusion Models: a new paradigm for taking your favorite open models into live performance, right on your own laptop! 🎵🎵 🧵

9

162

29

118

13K

Haven Kim @havenpersona

about 1 month ago

Dataset: https://t.co/0dfNrENeCx Paper: https://t.co/8Ulcva6YDC 😆

0

2

0

171

Haven Kim @havenpersona

about 1 month ago

I’m promoting our new conversational music recommendation dataset, Reddit2Deezer, the largest real-world, grounded CMR dataset (200k–600k conversations). The tracks and albums are mapped to the Deezer API, which enables straightforward access to audio previews and rich metadata.

havenpersona's tweet photo. I’m promoting our new conversational music recommendation dataset, Reddit2Deezer, the largest real-world, grounded CMR dataset (200k–600k conversations). The tracks and albums are mapped to the Deezer API, which enables straightforward access to audio previews and rich metadata. https://t.co/nlZc6YEv6x

1

14

3

1

521

Who to follow

Junghyun (Tony) Koo

@Junghyun_Koo

Research Scientist @SonyAI_global | PhD at Music and Audio Research Group (MARG), @SeoulNatlUni | Previous intern @merl_news, @Sony, and @Supertone_ai

Puyuan Peng

@PuyuanPeng

Research Scientist @Meta Superintelligence Lab. Speech & Audio. Previously @utaustin @uchicago @bnu_1902

Guan-Ting (Daniel) Lin

@GTL094144

Research Scientist | PhD from NTU @ntu_spml @HungyiLee2 | ex- @Meta @GoogleDeepMind @AmazonScience | Speech LM, Full-Duplex Interaction

havenpersona retweeted

Sumit @_reachsumit

about 1 month ago

Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation @yupenghou97 et al at Snap show that autoregressive SID generation forces structurally close items to receive correlated probabilities 📝https://t.co/LmURbSuSoh 👨🏽‍💻https://t.co/XUbKNjcZDy

0

21

5

10

6K

Haven Kim @havenpersona

about 2 months ago

Catch Yewon if you’re at CHI!

Yewon Kim @haiyewon

about 2 months ago

A Design Space for Live Music Agents 🎷🎹🥁 #CHI2026 What does it take for AI to truly jam with you? We surveyed 184 live music agents across AI, HCI, and Computer Music fields to map the design space, and where it's headed. 🗓️ Talk: Fri Apr 17, 12:15PM · P1 Room 132 📄 Paper: https://t.co/vNfdqrdV23 🔗 Interactive demo: https://t.co/XD1lP5pGCB

haiyewon's tweet photo. A Design Space for Live Music Agents 🎷🎹🥁 #CHI2026

What does it take for AI to truly jam with you? We surveyed 184 live music agents across AI, HCI, and Computer Music fields to map the design space, and where it's headed.

🗓️ Talk: Fri Apr 17, 12:15PM · P1 Room 132
📄 Paper: https://t.co/vNfdqrdV23
🔗 Interactive demo: https://t.co/XD1lP5pGCB

1

48

15

7

3K

0

10

0

514

havenpersona retweeted

Yewon Kim @haiyewon

about 2 months ago

A Design Space for Live Music Agents 🎷🎹🥁 #CHI2026 What does it take for AI to truly jam with you? We surveyed 184 live music agents across AI, HCI, and Computer Music fields to map the design space, and where it's headed. 🗓️ Talk: Fri Apr 17, 12:15PM · P1 Room 132 📄 Paper: https://t.co/vNfdqrdV23 🔗 Interactive demo: https://t.co/XD1lP5pGCB

1

48

15

7

3K

havenpersona retweeted

Daniel Zhao @astradzhao

8 months ago

We found a way to steer AI music gen toward specific notes, chords, and tempos, without retraining the model or significantly sacrificing audio quality! Introducing MusicRFM 🎵 Paper: https://t.co/oZciYbgB9P Audio: https://t.co/FQ1W8k1LZh Code: https://t.co/drnE1XGcFC (1/5)

3

27

6

9

3K

havenpersona retweeted

Fang-Duo Tsai 蔡芳鐸 @ICML @fundwotsai2001

5 months ago

We release two methods for controlling text-to-music models, MuseControlLite (ICML 2025) and SongEcho (submitted to ICLR 2026, I am not the author), on GitHub and HuggingFace! github💻: https://t.co/zFh70Tjxec Huggingface🤗: https://t.co/f89IA9Pt4D

1

17

5

9

2K

Haven Kim @havenpersona

7 months ago

@SeungHeon_Doh @keunwoochoi Seungheon's right. Specifically, https://t.co/iMhzb3xV2s

1

0

82

havenpersona retweeted

Yupeng Hou

@yupenghou97

7 months ago

Join us this afternoon at 13:45 in Room 203 for our @cikm2025 tutorial on generative recommendation and semantic IDs! https://t.co/JNQxVOBY7c #CIKM2025

yupenghou97's tweet photo. Join us this afternoon at 13:45 in Room 203 for our @cikm2025 tutorial on generative recommendation and semantic IDs!

https://t.co/JNQxVOBY7c

#CIKM2025 https://t.co/rkcy2gxVXp

0

14

4

0

988

Haven Kim @havenpersona

8 months ago

@rajammanabrolu @ZackAnkner @mansiege Congrats! The idea behind this paper is both important and exciting!

0

1

0

116

havenpersona retweeted

Hao-Wen (Herman) Dong 董皓文 @hermanhwdong

8 months ago

🔥I'm sharing all the materials and recordings for my course on *Music & AI* at University of Michigan! The course introduces AI’s applications in music from analysis, creation, retrieval to processing. Course website: https://t.co/xlXxtipG7m Recordings: https://t.co/MR30b3HBsz

3

44

12

13

6K

havenpersona retweeted

ISMIR Conference @ISMIRConf

9 months ago

Are AI models for music truly listening, or just good at guessing? This critical question is at the heart of the latest Best Paper Award winner at #ISMIR2025! Huge congratulations to Yongyi Zang, Sean O'brien, Taylor Berg Kirkpatrick, Julian McAuley, and Zachary Novack for their paper, "Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks." They expose how current benchmarks can be solved without genuine audio perception—even by text-only models! Their new framework, RUListening, creates evaluations that force models to prove they're actually hearing the music. A vital step forward for robust AI evaluation.

ISMIRConf's tweet photo. Are AI models for music truly listening, or just good at guessing? This critical question is at the heart of the latest Best Paper Award winner at #ISMIR2025!

Huge congratulations to Yongyi Zang, Sean O'brien, Taylor Berg Kirkpatrick, Julian McAuley, and Zachary Novack for their paper, "Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks."

They expose how current benchmarks can be solved without genuine audio perception—even by text-only models! Their new framework, RUListening, creates evaluations that force models to prove they're actually hearing the music. A vital step forward for robust AI evaluation.

0

23

9

6

10K

havenpersona retweeted

yunkee chae @ygch43

9 months ago

Thrilled to share that our paper "MGE‑LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction" is accepted to #NeurIPS2025 🚀🎶 Check out our preprint and sample page! arXiv: https://t.co/w9pFwW87Zg Project page: https://t.co/C9HORMl8S0

0

30

12

2K

Haven Kim @havenpersona

10 months ago

One of my favorite papers this year!! 🤩 🍝

Zachary Novack @zacknovack

10 months ago

Suno + Veo 3 generate highly similar versions of popular songs purely based on *phonetically* similar gibberish lyrics?!?! Presenting Bob’s Confetti: Phonetic Memorization Attacks in Music and Video Generation 🔊: https://t.co/Rztf9TNhI7 📖: https://t.co/H7BgnUkCvD 🧵1/n

2

90

25

42

12K

0

11

0

1

559

havenpersona retweeted

Zachary Novack @zacknovack

10 months ago

Suno + Veo 3 generate highly similar versions of popular songs purely based on *phonetically* similar gibberish lyrics?!?! Presenting Bob’s Confetti: Phonetic Memorization Attacks in Music and Video Generation 🔊: https://t.co/Rztf9TNhI7 📖: https://t.co/H7BgnUkCvD 🧵1/n

2

90

25

42

12K

Haven Kim @havenpersona

11 months ago

@keunwoochoi Jealous of KAIST people

0

2

0

238

havenpersona retweeted

Keunwoo Choi @keunwoochoi

11 months ago

i've joined KAIST Culture Technology department as an Adjunct Professor (겸직교수).. from New York, remotely. more collaborations and mentoring to come! 🎊

9

89

1

6

5K

havenpersona retweeted

Yupeng Hou

@yupenghou97

12 months ago

Did you know tokenization for generative recommendation today looks a lot like LLM tokenization did *10 years* ago? Meet ActionPiece, our #ICML2025 Spotlight paper, the first context-aware action tokenizer. 1/5 🧵

yupenghou97's tweet photo. Did you know tokenization for generative recommendation today looks a lot like LLM tokenization did *10 years* ago?

Meet ActionPiece, our #ICML2025 Spotlight paper, the first context-aware action tokenizer.

1/5 🧵 https://t.co/pB8rVFBvOQ

1

120

31

72

14K

Haven Kim

@havenpersona

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users