Isabelle Lee @ ICML @wordscompute - Twitter Profile

about 22 hours ago

We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples? 🧵NEW PAPER by @jenniferlumeng &al

nsaphra's tweet photo. We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples?

🧵NEW PAPER by @jenniferlumeng &al https://t.co/nf8E21emSK

6

232

40

161

21K

Isabelle Lee @ ICML @wordscompute

1 day ago

really excited to head home for icml:) and attending the co-located @farairesearch's alignment workshop (for the first time)! would love to meet others interested in training & interpretability

0

4

0

1

497

Isabelle Lee @ ICML @wordscompute

1 day ago

also, blog: https://t.co/D5E7TgHuLH 7/6

0

1

0

429

Isabelle Lee @ ICML @wordscompute

1 day ago

Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards? https://t.co/vsI1jgKlQF coming to @evaluatingevals at ACL as oral 🧵 1/6

wordscompute's tweet photo. Benchmarks can be superficial, but model explanations and evaluations are fundamentally intertwined. What if we used interpretability as principled, scientific evaluation? If it met scientific standards?

https://t.co/vsI1jgKlQF
coming to @evaluatingevals at ACL as oral 🧵
1/6 https://t.co/lSb9cl803I

1

45

8

17

2K

Who to follow

Surgical Data Science Collective

@SurgicalDSC

Nonprofit research organization dedicated to improving surgical outcomes by leveraging the power of artificial intelligence

Danijar Hafner

@danijarh

Building AI that autonomously understands and interacts with the world. Previous: @GoogleDeepMind @UCBerkeley @UofT

Bodhisattwa Majumder

@mbodhisattwa

I lead AI x (Data-driven) Discovery @allen_ai. 🧬 Agents + Search. @AdobeResearch Fellow. Prev Google, MSR, Meta. PhD @ucsd_cse.

Isabelle Lee @ ICML @wordscompute

1 day ago

work w/ @_emliu @cathy__jiao @BrihiJ @DaniYogatama @FazlBarez @m2saxon since am headed home for icml, it'll be presented by the amazing @BrihiJ! this was my first time writing a position paper, which turned into a grant, which i'm turning into multiple projects 🙂 stay tuned 6/6

1

7

1

0

955

wordscompute retweeted

Xiaoyan Bai

@Elenal3ai

6 days ago

🗣️ Prediction, Explanation, or Over-interpretation? Recent work suggests LLMs can verbalize information about latent states and future generations. But training of different verbalization methods varies. Are they verbalizing, or are we over-interpreting from the explanation? 1/n

Elenal3ai's tweet photo. 🗣️ Prediction, Explanation, or Over-interpretation?
Recent work suggests LLMs can verbalize information about latent states and future generations. But training of different verbalization methods varies.
Are they verbalizing, or are we over-interpreting from the explanation?
1/n https://t.co/xN8QIztWBI

8

186

24

161

29K

wordscompute retweeted

Stella Biderman @BlancheMinerva

6 days ago

In film, "we'll fix it in post" is what you say when something went wrong on set and you don't want to redo it. AI research has made it our entire methodology: train the model, then patch whatever comes out. Our new ICML oral argues this can't be the basis of a science of AI. 🧵

BlancheMinerva's tweet photo. In film, "we'll fix it in post" is what you say when something went wrong on set and you don't want to redo it. AI research has made it our entire methodology: train the model, then patch whatever comes out. Our new ICML oral argues this can't be the basis of a science of AI. 🧵 https://t.co/ok11oGRhUQ

7

343

49

154

44K

Isabelle Lee @ ICML @wordscompute

27 days ago

check out @_emliu and our work on pretraining interp! we initially asked if we can predict from simple task learning, can we predict a mode complex learning behavior? super excited for follow-ups as well:)

Emmy Liu @_emliu

27 days ago

Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining. Can we use this to predict what a model will learn next, just from its internals? 🧵

_emliu's tweet photo. Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining.

Can we use this to predict what a model will learn next, just from its internals? 🧵 https://t.co/exJhF9NN8d

16

484

64

395

54K

0

8

0

727

wordscompute retweeted

Emmy Liu @_emliu

4 months ago

Midtraining is a new part of many training pipelines, but when does it help and can it backfire? 🤔 In our new preprint, we use controlled experiments to pin this down. TL;DR; midtraining helps the most when it “bridges” pretraining and posttraining, and mitigates forgetting after posttraining. Timing is also very important. 🧵

_emliu's tweet photo. Midtraining is a new part of many training pipelines, but when does it help and can it backfire? 🤔

In our new preprint, we use controlled experiments to pin this down. TL;DR; midtraining helps the most when it “bridges” pretraining and posttraining, and mitigates forgetting after posttraining. Timing is also very important.
🧵

5

633

88

551

99K

wordscompute retweeted

Naomi Saphra @nsaphra

4 months ago

Our report from the Actionable Interpretability workshop is finally public! Some of my favorite scientists argued for hours and this is what they agreed on.

1

22

5

8

4K

Isabelle Lee @ ICML @wordscompute

4 months ago

check hadas’s paper on what useful interpretability could enable! ..like effectively predicting deployment failures for example 🙃

Hadas Orgad @OrgadHadas

4 months ago

Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How? We're ready to answer. 🧵

OrgadHadas's tweet photo. Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How?
We're ready to answer.
🧵 https://t.co/Q61MLb9kO8

2

249

40

195

36K

0

5

0

1

542

wordscompute retweeted

Sarah Liaw @liaw_sarah

4 months ago

Excited to share our new dataset, FOL-Traces! We introduce a large-scale dataset of programmatically verified FOL reasoning traces for studying structured logical inference + process fidelity Happy to hear thoughts from others working on reasoning in LLMs Check it out here 👇

1

2

1

0

1K

Isabelle Lee @ ICML @wordscompute

4 months ago

paper: https://t.co/CewtwTQZOG dataset: https://t.co/NaznvGFCT9 work w/ @liaw_sarah and @DaniYogatama If you want to chat about interpretability & training dynamics & reasoning and munch on mezzes, come hang out with me in Rabat 🇲🇦🙃 9/9

1

3

0

702

Isabelle Lee @ ICML @wordscompute

4 months ago

New dataset 🗂️ coming to #eacl What is (correct) reasoning in LLMs? How do you rigorously define/measure process fidelity? How might we study its acquisition in large scale training? We made a gigantic, verifiably correct reasoning traces of first order logic expressions! 1/9

wordscompute's tweet photo. New dataset 🗂️ coming to #eacl

What is (correct) reasoning in LLMs? How do you rigorously define/measure process fidelity? How might we study its acquisition in large scale training? We made a gigantic, verifiably correct reasoning traces of first order logic expressions!
1/9 https://t.co/C9RYQvuwji

1

5

0

2K

Isabelle Lee @ ICML @wordscompute

4 months ago

I wanted to study reasoning acquisition in training by complexity + process fidelity but wasn't able to find a dataset. So we built one that's rigorously annotated and large enough to train a small LM. Now I’m excited about what we can do with it 8/9

1

0

673

Isabelle Lee @ ICML

@wordscompute

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users