Millicent Li @millicent_li - Twitter Profile

Pinned Tweet

9 months ago

Wouldn’t it be great to have questions about LM internals answered in plain English? That’s the promise of verbalization interpretability. Unfortunately, our new paper shows that evaluating these methods is nuanced—and verbalizers might not tell us what we hope they do. 🧵👇1/9

millicent_li's tweet photo. Wouldn’t it be great to have questions about LM internals answered in plain English? That’s the promise of verbalization interpretability. Unfortunately, our new paper shows that evaluating these methods is nuanced—and verbalizers might not tell us what we hope they do. 🧵👇1/9 https://t.co/BiDCuMdgop

4

74

19

53

10K

millicent_li retweeted

Isabelle Lee @ ICML @wordscompute

1 day ago

Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards? https://t.co/vsI1jgKlQF coming to @evaluatingevals at ACL as oral 🧵 1/6

wordscompute's tweet photo. Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards?

https://t.co/vsI1jgKlQF
coming to @evaluatingevals at ACL as oral 🧵
1/6 https://t.co/lSb9cl803I

1

45

8

17

2K

Millicent Li @millicent_li

27 days ago

How do we figure out whether a LM has learned the right skills, and in what order? We look at model internals :) Great work led by @_emliu!!

Emmy Liu @_emliu

27 days ago

Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining. Can we use this to predict what a model will learn next, just from its internals? 🧵

_emliu's tweet photo. Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining.

Can we use this to predict what a model will learn next, just from its internals? 🧵 https://t.co/exJhF9NN8d

16

484

64

395

54K

0

5

1

499

millicent_li retweeted

Zihao (Gavin) Yang @ZihaoGavinYang

about 1 month ago

1/ (New paper!) If swapping the gender in an input prompt makes the AI model give a different answer it means that it has to have a gender bias, right? Wrong. 🧵on counterfactual prompting for LLM evals: Paper: https://t.co/i3Zc0UlyFF

ZihaoGavinYang's tweet photo. 1/ (New paper!)
If swapping the gender in an input prompt makes the AI model give a different answer it means that it has to have a gender bias, right? Wrong.
🧵on counterfactual prompting for LLM evals:
Paper: https://t.co/i3Zc0UlyFF https://t.co/Al7Rn1FoVe

3

293

24

305

307K

Who to follow

Chen Zhao

@henryzhao4321

Assistant Professor NYU Shanghai, Postdoc NYU, PhD @umdclip doing NLP research, bridge player

Jessy Li

@jessyjli

Associate Professor @UTAustin @UT_Linguistics, computational linguistics and #NLProc

Tianyu Gao

@gaotianyu1350

@Meta MSL TBD lab and incoming assistant prof. @UCSanDiego. Prev @OpenAI @Princeton @Tsinghua_Uni

Millicent Li @millicent_li

about 1 month ago

@StephenLCasper They definitely missed proper evals to be able to ensure something like this would work... my collaborators and I have been working in this space to understand faithfulness issues (see: https://t.co/an0hvJtxYm which was accepted to ICML) but they seem to have glossed over it?

0

8

0

3

278

Millicent Li @millicent_li

about 1 month ago

@zhuokaiz There's a few other evals that we show in the Appendix of the paper too that also basically that this lack of verbalizing "privileged" knowledge persists across task types. But without enforcing this privileged constraint, you don't know which model's knowledge you're using

0

7

1

0

238

Millicent Li @millicent_li

about 1 month ago

@zhuokaiz Yeah exactly, a lot of these works in this domain (including many of the subsequent works in verbalization incl. Activation Oracles + this white paper) seem to ignore the fact that the verbalizer is an LLM itself, and it's obvious the eval they do should reflect this notion.

2

11

1

321

millicent_li retweeted

Hye Sun Yun @hyesunyun

2 months ago

Patients ask LLMs medical questions, but how they phrase it matters more than it should. Our new preprint explores how different phrasings of patient health questions can lead to inconsistent conclusions, even with the same evidence. [1/6] Full Paper: https://t.co/CPhz94eAfc

hyesunyun's tweet photo. Patients ask LLMs medical questions, but how they phrase it matters more than it should.

Our new preprint explores how different phrasings of patient health questions can lead to inconsistent conclusions, even with the same evidence. [1/6]

Full Paper: https://t.co/CPhz94eAfc https://t.co/Qcx3AgnjgJ

1

22

6

2

3K

millicent_li retweeted

Emmy Liu @_emliu

4 months ago

Midtraining is a new part of many training pipelines, but when does it help and can it backfire? 🤔 In our new preprint, we use controlled experiments to pin this down. TL;DR; midtraining helps the most when it “bridges” pretraining and posttraining, and mitigates forgetting after posttraining. Timing is also very important. 🧵

_emliu's tweet photo. Midtraining is a new part of many training pipelines, but when does it help and can it backfire? 🤔

In our new preprint, we use controlled experiments to pin this down. TL;DR; midtraining helps the most when it “bridges” pretraining and posttraining, and mitigates forgetting after posttraining. Timing is also very important.
🧵

5

633

88

551

99K

Millicent Li @millicent_li

5 months ago

@scychan_brains Would like to chat :)! (Your DM's aren't open to non-followers)

0

1

0

49

millicent_li retweeted

Eric Todd @ericwtodd

5 months ago

Can you solve this algebra puzzle? 🧩 cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️

ericwtodd's tweet photo. Can you solve this algebra puzzle? 🧩

cb=c, ac=b, ab=?

A small transformer can learn to solve problems like this!

And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️ https://t.co/4IRrEp1gDY

8

321

49

231

56K

millicent_li retweeted

Koyena Pal

@kpal_koyena

5 months ago

Can models understand each other's reasoning? 🤔 When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way? Our new preprint with @davidbau and @csinva explores CoT generalizability 🧵👇 (1/7)

kpal_koyena's tweet photo. Can models understand each other's reasoning? 🤔

When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?

Our new preprint with @davidbau and @csinva explores CoT generalizability 🧵👇

(1/7) https://t.co/rwB9BcOafB

7

207

24

142

25K

millicent_li retweeted

Chantal @ChantalShaib

7 months ago

I’ll be at #NeurIPS2025 this week to present some work on spurious correlations! Catch us at the poster session on 12/3 😊

1

28

4

6

6K

millicent_li retweeted

Sanjana Ramprasad @ NeurIPS2025 @sanjana_rampi

7 months ago

Come check out our #NeurIPS2025 paper “Do Automatic Factuality Metrics Measure Factuality?” on Friday! We systematically investigate this question and find some surprising results 👇🧵 💻 Paper/Code/Blog: https://t.co/zHnEN1gXRq Work w/ @byron_c_wallace

1

8

3

1

836

Millicent Li @millicent_li

7 months ago

@996roma Would love to chat with you :)

0

3

0

133

Millicent Li @millicent_li

7 months ago

@AndrewLampinen Would love to chat!

0

1

0

113

Millicent Li @millicent_li

8 months ago

@CFGeek We actually showed some of the same pitfalls that you mentioned in our current paper on analyzing existing activation verbalization methods paper: https://t.co/an0hvJtxYm. There are possibly ways to avoid input structure, but we find it too easy to get the input information.

0

3

0

93

millicent_li retweeted

Aaron Mueller @amuuueller

9 months ago

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms). We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

amuuueller's tweet photo. What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics! https://t.co/1tVRZk3SyL

2

100

26

55

6K

Millicent Li @millicent_li

9 months ago

@saprmarks @nsaphra @byron_c_wallace It's definitely tricky though to design them well though, we can all admit here!

0

71

Millicent Li @millicent_li

9 months ago

Wouldn’t it be great to have questions about LM internals answered in plain English? That’s the promise of verbalization interpretability. Unfortunately, our new paper shows that evaluating these methods is nuanced—and verbalizers might not tell us what we hope they do. 🧵👇1/9

4

74

19

53

10K

Millicent Li @millicent_li

9 months ago

@saprmarks @nsaphra @byron_c_wallace I agree there might be some statistical priors that influence the likelihood of seeing the fake names, but I think most of all we would like to show how the nature of evaluations drastically affects whether you're able to extract information that you intend to for verbalization.

1

0

73

Millicent Li

@millicent_li

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users