Matthew Yang @_matthewyang - Twitter Profile

Pinned Tweet

5 months ago

Almost nobody does proper credit assignment in RL-on-LLMs 💀 Learning only from the final outcome → punishes good steps 😭 → rewards bad steps 😭😭 🚨New Paper🚨 A new paradigm for credit assignment: LLMs identify their own mistakes ❌ and propose targeted fixes 🎯 🧵[1/n]

_matthewyang's tweet photo. Almost nobody does proper credit assignment in RL-on-LLMs 💀

Learning only from the final outcome
→ punishes good steps 😭
→ rewards bad steps 😭😭

🚨New Paper🚨
A new paradigm for credit assignment:
LLMs identify their own mistakes ❌ and propose targeted fixes 🎯

🧵[1/n] https://t.co/RSDhOkNP3j

8

195

26

173

12K

_matthewyang retweeted

Jack Bai

@jackbot_cs

8 days ago

Tomorrow at ExHall A & F Poster Location: 471, 5pm-7pm, we'll present WebGym, together introducing two recently pre-released works (AsyncWebRL and OpenWebRL) as a surprise :) WebGym: https://t.co/vUva6lTXah OpenWebRL: https://t.co/79pJpL1NI3 AsyncWebRL: https://t.co/PdvVJTGoQV

jackbot_cs's tweet photo. Tomorrow at ExHall A & F Poster Location: 471, 5pm-7pm, we'll present WebGym, together introducing two recently pre-released works (AsyncWebRL and OpenWebRL) as a surprise :)

WebGym: https://t.co/vUva6lTXah
OpenWebRL: https://t.co/79pJpL1NI3
AsyncWebRL: https://t.co/PdvVJTGoQV https://t.co/qi6fevUodc

0

8

2

0

589

_matthewyang retweeted

Boaz Barak @boazbaraktcs

20 days ago

To be clear, while I believe human mathematicians will have role to play, it does not mean there won't be a dramatic shift, nor that there is not a sense of loss. I will personally miss the days of being able to sit, with just pen and paper, and discover via pure thought a mathematical truth than no one knew before.

19

189

10

67

54K

_matthewyang retweeted

Ian Wu @ianwu97

about 2 months ago

I'll be presenting 3 papers at ICLR 2026! Please stop by for a chat. #ICLR2026 🇧🇷🇧🇷🇧🇷

0

49

7

11

7K

_matthewyang retweeted

idan shenfeld

@IdanShenfeld

about 2 months ago

I’m flying tomorrow to Brazil for ICLR! 🇧🇷 If you’re into continual learning, self-distillation, learning from textual feedback, or just want to chat—let’s meet! I'll be presenting the following papers:

IdanShenfeld's tweet photo. I’m flying tomorrow to Brazil for ICLR! 🇧🇷

If you’re into continual learning, self-distillation, learning from textual feedback, or just want to chat—let’s meet!

I'll be presenting the following papers: https://t.co/XPNSSXg5aJ

7

303

28

193

18K

_matthewyang retweeted

Ian Wu @ianwu97

4 months ago

1/How can we train LLMs to continually improve their reasoning over test horizons much longer than their training token budgets? Introducing Reasoning Cache (RC), an algorithm that trains LLMs to *extrapolate*.

ianwu97's tweet photo. 1/How can we train LLMs to continually improve their reasoning over test horizons much longer than their training token budgets?

Introducing Reasoning Cache (RC), an algorithm that trains LLMs to *extrapolate*. https://t.co/syt7uTkjNw

5

201

30

166

13K

_matthewyang retweeted

Aviral Kumar

@aviral_kumar2

4 months ago

Can just a 4B model solve IMO-level proof problems at the level of much stronger LLMs like Gemini 3 Pro? Yes, if you can train the LLM to scale test-time compute well! We're very excited to release our 4B model "QED-Nano", built via an awesome open collab! Details below🧵⬇️

aviral_kumar2's tweet photo. Can just a 4B model solve IMO-level proof problems at the level of much stronger LLMs like Gemini 3 Pro? Yes, if you can train the LLM to scale test-time compute well!

We're very excited to release our 4B model "QED-Nano", built via an awesome open collab! Details below🧵⬇️

8

167

26

98

22K

Matthew Yang

@_matthewyang

5 months ago

@rosmine We tried generating interventions with larger models, namely Qwen3-30B-A3B-Instruct (see section 3) and Gemini 2.5 Pro (see Appendix). We find that larger models tend to generate better interventions (row 6 vs. row 5).

_matthewyang's tweet photo. @rosmine We tried generating interventions with larger models, namely Qwen3-30B-A3B-Instruct (see section 3) and Gemini 2.5 Pro (see Appendix).

We find that larger models tend to generate better interventions (row 6 vs. row 5). https://t.co/tM1b40XPaN

0

2

0

134

Matthew Yang

@_matthewyang

5 months ago

Almost nobody does proper credit assignment in RL-on-LLMs 💀 Learning only from the final outcome → punishes good steps 😭 → rewards bad steps 😭😭 🚨New Paper🚨 A new paradigm for credit assignment: LLMs identify their own mistakes ❌ and propose targeted fixes 🎯 🧵[1/n]

8

195

26

173

12K

Matthew Yang

@_matthewyang

5 months ago

@dhruvbhatia0 No, it does not, because we run standard online RL after we SFT on the interventions

1

2

0

65

_matthewyang retweeted

Aviral Kumar

@aviral_kumar2

5 months ago

🚨🚨New paper Scaling RL to complex tasks shows credit assignment is a bottleneck But standard way of fitting PRM + optimizing it is too inefficient to solve it❌ Our idea: use asymmetries in an LLM to let it do its own credit assignment, in natural language w/o PRMs! 🧵⬇️

aviral_kumar2's tweet photo. 🚨🚨New paper

Scaling RL to complex tasks shows credit assignment is a bottleneck

But standard way of fitting PRM + optimizing it is too inefficient to solve it❌

Our idea: use asymmetries in an LLM to let it do its own credit assignment, in natural language w/o PRMs! 🧵⬇️

7

209

29

173

12K

_matthewyang retweeted

Jack Bai

@jackbot_cs

5 months ago

🚨 New Paper Alert 🚨 💥 SFT on hard tasks given reference solution is usually too off-policy, which can cause the training to crash. 🐌 On-policy RL on these hard tasks introduces low sample efficiency, although more stable. 😈 Today, we introduce Intervention Training (InT), an algorithm that avoids shortcomings of both sides. A thread 🧵 1/n

5

185

30

175

13K

Matthew Yang

@_matthewyang

5 months ago

Thank you to my amazing set of collaborators @jackbot_cs @ianwu97 @geneyang4 @setlur_amrith @aviral_kumar2 for making this happen!!! 🙏🙏🙏 And grateful to end my master’s journey with this project ⛵️🌅😎 🧵[7/n]

0

9

0

498

Matthew Yang

@_matthewyang

5 months ago

website: https://t.co/cVK44TVQ7n paper: https://t.co/WTXJfdSsOi code: https://t.co/wnVaxgLaVB 🧵[6/n]

1

9

0

3

532

_matthewyang retweeted

Jack Bai

@jackbot_cs

5 months ago

😈 Today, we introduce WebGym, the largest-to-date open-source RL environment for web agent training that contains 300k tasks and a rollout framework optimized specifically for web environments' rollout speed. We reveal the effects of essential scaling directions we observe with WebGym. 1/n

13

376

37

308

45K

_matthewyang retweeted

Aviral Kumar

@aviral_kumar2

7 months ago

🚨🚨New blog post led by CMU students: Want to know why LLM RL training plateaus on hard problems & scaling compute may not help? And how to fix this issue? Turns out it stems from a coupling of poor exploration & optimization. Classical ways to explore don't work, but ours does! 🧵⬇️

6

254

44

206

31K

_matthewyang retweeted

Andrew Zhao

@_AndrewZhao

9 months ago

paper of the day

15

572

35

132

86K

Matthew Yang

@_matthewyang

Last Seen Users on Sotwe

Trends for you

Most Popular Users