Xinyue Liu @irisiris_l - Twitter Profile

Pinned Tweet

2 months ago

🙌 Excited to share our new paper and my first project in my PhD journey! We show finetuning on a writing task unlocks verbatim recall of copyrighted books from authors not in the finetuning data. It’s been an incredible experience working with such an amazing group of people ✨

Tuhin Chakrabarty

@TuhinChakr

2 months ago

🚨New paper on AI & Copyright 👨‍⚖️Courts have credited LLM companies' claims that safety alignment prevents reproduction of copyrighted expression. But what if fine-tuning on a simple writing task ruins it all? Worse : Fine-tuning on a single author's books (e.g., Murakami) unlocks verbatim recall of copyrighted books from 30+ unrelated authors, sometimes as high as 90%. Joint work with @niloofar_mire (@LTIatCMU), Jane Ginsburg ( @ColumbiaLaw) and my amazing PhD student @irisiris_l (@sbucompsc ) (1/n)🧵

TuhinChakr's tweet photo. 🚨New paper on AI & Copyright

👨‍⚖️Courts have credited LLM companies' claims that safety alignment prevents reproduction of copyrighted expression.

But what if fine-tuning on a simple writing task ruins it all?

Worse : Fine-tuning on a single author's books (e.g., Murakami) unlocks verbatim recall of copyrighted books from 30+ unrelated authors, sometimes as high as 90%.

Joint work with @niloofar_mire (@LTIatCMU), Jane Ginsburg ( @ColumbiaLaw) and my amazing PhD student @irisiris_l (@sbucompsc )

(1/n)🧵

16

391

153

190

115K

3

47

11

7

9K

irisiris_l retweeted

Yapei Chang

@YapeiChang

1 day ago

post-trained models are more helpful, but collapse toward a narrow range of possible answers 🍎 with ReDiPO, we show how to recover the lost diversity with a simple DPO data pipeline, while largely preserving instruction-following and safety great work led by @vsamuel2003 !

0

34

5

13

4K

irisiris_l retweeted

Harveen Singh Chadha

@HarveenChadha

1 day ago

This is very interesting decision, microsoft decided not to use any LLM generated data or any open source training dataset for pretraining

HarveenChadha's tweet photo. This is very interesting decision, microsoft decided not to use any LLM generated data or any open source training dataset for pretraining https://t.co/wc0ehMofSp

8

154

9

50

12K

irisiris_l retweeted

Sid

@sidbid

7 days ago

Super excited to finally share Dynamic Workflows in Claude Code!! We built this a couple months ago, and it has slowly become a daily driver for a bunch of people at Anthropic. A few tips for getting the most out of it 🧵 https://t.co/WtwkSd3JPp

97

2K

170

3K

480K

irisiris_l retweeted

Weiwei Sun @sunweiwei12

8 days ago

Excited to share our new work on Reinforcing Human Behavior Simulation via Verbal Feedback. Can human simulators learn from feedback, not just rewards? Most RL for LLMs turns feedback into a single score. But human behavior is rarely just right or wrong. It is social, contextual, subjective, and multi-dimensional. A score can tell the model what is better. Verbal feedback can tell it why. Meet DITTO + SOUL. Paper: https://t.co/G0cEHr53h0 Code: https://t.co/6osJizwUDi Model: https://t.co/yIAvpbKPSd

7

227

42

162

33K

irisiris_l retweeted

Tuhin Chakrabarty

@TuhinChakr

13 days ago

Ran some 🧪 with @irisiris_l to 🔬 why the Granta story was certainly 🤖 slop A lot of bad writing happens coz AI hasn’t learned aesthetics. It has memorized the whole internet and called it a day. So sure, maybe you don't trust AI detectors. But you can trust your own 👁️.

TuhinChakr's tweet photo. Ran some 🧪 with @irisiris_l to 🔬 why the Granta story was certainly 🤖 slop

A lot of bad writing happens coz AI hasn’t learned aesthetics. It has memorized the whole internet and called it a day.

So sure, maybe you don't trust AI detectors. But you can trust your own 👁️. https://t.co/25OfkN4nV8

2

40

9

13

6K

irisiris_l retweeted

Goodfire

@GoodfireAI

14 days ago

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

24

1K

150

760

169K

irisiris_l retweeted

Michael Li @bearseascape

15 days ago

Do the circuits we extract to explain a model's behavior actually tell us how it solves a specific task? In new work w/ @nsubramani23, we find that circuits fail a basic check: ablating one task's circuit hurts another task about as much as ablating that task's own circuit. 🧵

bearseascape's tweet photo. Do the circuits we extract to explain a model's behavior actually tell us how it solves a specific task? In new work w/ @nsubramani23, we find that circuits fail a basic check: ablating one task's circuit hurts another task about as much as ablating that task's own circuit. 🧵 https://t.co/rk25go56CV

2

26

6

15

3K

irisiris_l retweeted

Tatsunori Hashimoto @tatsu_hashimoto

14 days ago

Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit.

tatsu_hashimoto's tweet photo. Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit. https://t.co/VhshLOWBIx

32

1K

149

905

214K

irisiris_l retweeted

Emmy Liu @_emliu

15 days ago

Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining. Can we use this to predict what a model will learn next, just from its internals? 🧵

_emliu's tweet photo. Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining.

Can we use this to predict what a model will learn next, just from its internals? 🧵 https://t.co/exJhF9NN8d

16

476

62

390

53K

irisiris_l retweeted

Thinking Machines

@thinkymachines

16 days ago

We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! https://t.co/907HfBy7g3

51

2K

197

1K

589K

Xinyue Liu @irisiris_l

16 days ago

@jennajrussell @pangram congrats!! lets hang out some time im also in the city this summer :))

1

2

0

153

irisiris_l retweeted

Boyd Kane is in London @beyarkay

19 days ago

I wrote some things about my MATS experience, give it a read! https://t.co/O8PqHJncb1

10

369

14

361

69K

irisiris_l retweeted

Vox

@voxdotcom

18 days ago

An author set up an experiment to find out. https://t.co/HunRvd4ypf

1

2

0

6K

irisiris_l retweeted

Mingyu_Jin19

@fnruji316625

21 days ago

Does mechanistic interpretability really find the circuit? Our new paper, "All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs," (Accepted by ICML 2026) suggests the answer may be: not always. A common implicit assumption in mechanistic interpretability is that a model's behavior is explained by the circuit — a sparse, canonical, almost-unique mechanism. Instead, for the same LLM task, we find multiple circuits/sheaves that are: ✅ faithful ✅ sparse ✅ structurally different ✅ low-overlap This means a discovered circuit may not be the unique mechanism behind a behavior, but one realization among many possible mechanisms. We call for rethinking how circuit/sheaf discovery results should be interpreted and evaluated. Huge thanks to my amazing collaborators: @frankniujc, @YutongYin774638, and @zhaoran_wang Paper: https://t.co/J5zO36Mr7m #MechanisticInterpretability #LLM #AI #MachineLearning

fnruji316625's tweet photo. Does mechanistic interpretability really find the circuit?

Our new paper, "All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs," (Accepted by ICML 2026) suggests the answer may be: not always.

A common implicit assumption in mechanistic interpretability is that a model's behavior is explained by the circuit — a sparse, canonical, almost-unique mechanism.

Instead, for the same LLM task, we find multiple circuits/sheaves that are:
✅ faithful
✅ sparse
✅ structurally different
✅ low-overlap

This means a discovered circuit may not be the unique mechanism behind a behavior, but one realization among many possible mechanisms. We call for rethinking how circuit/sheaf discovery results should be interpreted and evaluated.

Huge thanks to my amazing collaborators: @frankniujc, @YutongYin774638, and @zhaoran_wang

Paper: https://t.co/J5zO36Mr7m

#MechanisticInterpretability #LLM #AI #MachineLearning

13

462

64

388

48K

irisiris_l retweeted

Mingqian Zheng @elisazmq_zheng

22 days ago

LLMs refuse ambiguous queries that look harmful but aren't. Can they recover once users clarify, while staying safe? Our new interactive multi-turn benchmark measures both. 🚨 Turns out: not both at once.

elisazmq_zheng's tweet photo. LLMs refuse ambiguous queries that look harmful but aren't. Can they recover once users clarify, while staying safe? Our new interactive multi-turn benchmark measures both.
🚨 Turns out: not both at once. https://t.co/XyE48IXQSf

7

95

24

43

9K

Xinyue Liu @irisiris_l

26 days ago

thank you!! really excited this work is reaching the community! more soon 👀

Tuhin Chakrabarty

@TuhinChakr

26 days ago

https://t.co/DtgLpWpmJc Very happy that Alignment Whack-a-Mole is one of the Top Papers on AI in Law Q1 2026 in @SSRN. Congrats to my student @irisiris_l who is busy preparing for her final exams :)

0

9

3

0

2K

1

7

1

2

912

irisiris_l retweeted

Brian Stelter

@brianstelter

30 days ago

"Five major publishers — Hachette, Macmillan, McGraw Hill, Elsevier and Cengage — and the best-selling novelist Scott Turow have filed a class-action copyright infringement lawsuit against Meta and its founder and chief executive, Mark Zuckerberg." https://t.co/kzcGcFziKB

42

4K

2K

193

109K

irisiris_l retweeted

Augmented Mind Podcast

@augmind_fm

about 1 month ago

New episode of the AM podcast dropping soon! In EP4, we sat down with @kenziyuliu, CS PhD student at @StanfordAILab and creator of The Open Anonymity Project, to talk about the privacy layer of personal intelligence. Here's a preview 😃

1

39

17

8

9K

irisiris_l retweeted

Kasra Khadem

@Kaz_Khadem

about 1 month ago

Some people really saw this and said “let me go build agentic workflows in Palo Alto”

29

2K

69

114

88K

irisiris_l retweeted

James Zou @james_y_zou

about 1 month ago

Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.

james_y_zou's tweet photo. Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️

You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free. https://t.co/ZoHWcx7MXg

43

2K

244

2K

169K

Xinyue Liu

@irisiris_l

Last Seen Users on Sotwe

Trends for you

Most Popular Users