Yuling Gu @gu_yuling - Twitter Profile

Pinned Tweet

3 months ago

🎉 SimpleToM has been accepted to #ICLR2026! LLMs can tell you what someone knows (explicit ToM). But when asked to apply it to predict behavior or judge actions (applied ToM), even frontier LLMs still fail. 🤯 The gap between knowing and applying is real… and huge. 👀 1/

gu_yuling's tweet photo. 🎉 SimpleToM has been accepted to #ICLR2026!

LLMs can tell you what someone knows (explicit ToM).
But when asked to apply it to predict behavior or judge actions (applied ToM), even frontier LLMs still fail. 🤯

The gap between knowing and applying is real… and huge. 👀

1/ https://t.co/FOmfwJeORT

2

121

17

54

15K

Yuling Gu @gu_yuling

about 1 month ago

I can’t be there in person due to visa issues, but please meet my amazing co-authors! My DM (and email) are open if you’d like to connect!

0

3

1

0

369

Yuling Gu @gu_yuling

about 1 month ago

Check out SimpleToM at #ICLR2026 where we reveal a critical fragility in LLMs’ social reasoning — the explicit vs. applied ToM gap. 🗓️Fri, Apr 24, 2026 3:15 PM – 5:45 PM BRT 📍Pavilion 3 P3-#1407

Yuling Gu @gu_yuling

3 months ago

🎉 SimpleToM has been accepted to #ICLR2026! LLMs can tell you what someone knows (explicit ToM). But when asked to apply it to predict behavior or judge actions (applied ToM), even frontier LLMs still fail. 🤯 The gap between knowing and applying is real… and huge. 👀 1/

2

121

17

54

15K

1

17

1

3

2K

Yuling Gu @gu_yuling

3 months ago

Work done during my time at @allen_ai with wonderful collaborators Oyvind Tafjord, @hyunw_kim, @jaredlcm, @Ronan_LeBras, Peter Clark, @YejinChoinka. 📜 Paper: https://t.co/Lv13te4Idy 💻 Code: https://t.co/FpzRETe1kD 6/

0

9

2

0

397

Who to follow

Weijia Shi

@WeijiaShi2

PhD student @uwnlp | Prev @allen_ai @MetaAI @CS_UCLA | 🏠 https://t.co/Q6Mzg8ow2j

Zhaofeng Wu

@zhaofeng_wu

PhD student @MIT_CSAIL | Previously @allen_ai | MS'21 BS'19 BA'19 @uwnlp | 💼 on the industry job market

3 months ago

🎉 SimpleToM has been accepted to #ICLR2026! LLMs can tell you what someone knows (explicit ToM). But when asked to apply it to predict behavior or judge actions (applied ToM), even frontier LLMs still fail. 🤯 The gap between knowing and applying is real… and huge. 👀 1/

2

121

17

54

15K

Yuling Gu @gu_yuling

3 months ago

SimpleToM exposes this gap 🔎 and provides a benchmark to diagnose, improve, and push LLMs toward robust social reasoning 🚀 Try SimpleToM on any model : https://t.co/FnGe9Oa9wk 5/

1

5

2

0

444

gu_yuling retweeted

Kyunghyun Cho

@kchonyc

6 months ago

i gave a keynote talk at NeurIPS'25 just last week. here's the slide deck (link below) i've used to share my thoughts on who we are and what we do.

kchonyc's tweet photo. i gave a keynote talk at NeurIPS'25 just last week. here's the slide deck (link below) i've used to share my thoughts on who we are and what we do. https://t.co/aDAzGsUciA

3

245

28

131

21K

Yuling Gu @gu_yuling

7 months ago

Super proud of the amazing work that my Ai2 friends have been doing! 🤩 Check this out! ✨

Ai2 @allen_ai

7 months ago

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

allen_ai's tweet photo. Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵 https://t.co/vnGrArA44X

54

2K

326

693

610K

0

11

0

1K

Yuling Gu @gu_yuling

8 months ago

@soldni @kylelostat @allen_ai Looks cool 🤩 Is this bag the latest swag? 👀

0

3

0

477

gu_yuling retweeted

Danica Dillion

@danicajdillion

9 months ago

🌍 Introducing WorldValuesBench! A benchmark to evaluate how well LLMs reflect cultural differences in human values. Built from 94k+ participants in the World Values Survey → 20M examples of (demographics, value question → answer). 🧵

1

5

2

1

651

gu_yuling retweeted

David Heineman @davidheinnman

10 months ago

Evaluating language models is tricky, how do we know if our results are real, or due to random chance? We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵

davidheinnman's tweet photo. Evaluating language models is tricky, how do we know if our results are real, or due to random chance?

We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵 https://t.co/8sAL5yWbh3

4

240

54

193

47K

Yuling Gu @gu_yuling

11 months ago

@code_star Super excited to have more people like you joining in, looking into the details behind evals, and asking these interesting + important questions! 👍

1

0

206

Yuling Gu @gu_yuling

about 1 year ago

Come to our poster session on Friday, May 2, 9-10.30 am (Hall 3) to chat more!

0

2

0

366

Yuling Gu @gu_yuling

about 1 year ago

Excited to be at #NAACL2025 in Albuquerque this week! I'll be presenting "OLMES: A Standard for Language Model Evaluations" (https://t.co/SmjBV2Szsk)! Work done with my wonderful collaborators at @allen_ai ❤️

2

52

10

4K

Yuling Gu @gu_yuling

about 1 year ago

This effort toward an open language model evaluation standard doesn’t just end here. Since the submission of our NAACL paper, we have added more tasks to OLMES, including generative and reasoning tasks, all openly available in our repository (https://t.co/54sbLDWWBM).

1

2

0

650

gu_yuling retweeted

Ai2 @allen_ai

about 1 year ago

Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐 Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵

allen_ai's tweet photo. Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐

Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵 https://t.co/cjqbP4P5Ba

6

365

95

230

42K

gu_yuling retweeted

Kyle Lo

@kylelostat

over 1 year ago

kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels 🫡 🚗 2 OLMo 2 Furious 🔥 is everythin we learned since OLMo 1, with deep dives into: 🚖 stable pretrain 🚔 lr anneal 🤝 data curricula 🤝 soups 🚘 tulu post-train 🚜 compute infra 👇🧵

kylelostat's tweet photo. kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels 🫡

🚗 2 OLMo 2 Furious 🔥 is everythin we learned since OLMo 1, with deep dives into:

🚖 stable pretrain
🚔 lr anneal 🤝 data curricula 🤝 soups
🚘 tulu post-train
🚜 compute infra

👇🧵 https://t.co/hYKCif0Xwj

3

359

70

140

47K

Yuling Gu

@gu_yuling

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users