Minsik Oh @minsik_nlp - Twitter Profile

Pinned Tweet

11 days ago

Excited to share that our paper has been accepted to ACL Main (Oral)! GitHub: https://t.co/jSJwv9Vt9N Embedding models coming soon on HuggingFace.

minsik_nlp's tweet photo. Excited to share that our paper has been accepted to ACL Main (Oral)!

GitHub: https://t.co/jSJwv9Vt9N

Embedding models coming soon on HuggingFace. https://t.co/oRuSB6LRpE

Minsik Oh

@minsik_nlp

about 3 years ago

New Preprint! TaDSE: Template-aware Dialogue Sentence Embeddings. How do you create Sentence Embeddings for dialogue systems that are semantically relevant? Are current sentence embeddings enough? We explore the questions with our paper. 🧵 Preprint: https://t.co/czSOnfwiqf

minsik_nlp's tweet photo. New Preprint! TaDSE: Template-aware Dialogue Sentence Embeddings.
How do you create Sentence Embeddings for dialogue systems that are semantically relevant? Are current sentence embeddings enough? We explore the questions with our paper. 🧵

Preprint: https://t.co/czSOnfwiqf https://t.co/3djg5pSmIT

1

52

10

15

9K

2

11

0

2K

minsik_nlp retweeted

Anthropic

@AnthropicAI

2 days ago

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: https://t.co/bwn0sximKZ

12K

88K

26K

24K

88M

Minsik Oh

@minsik_nlp

11 days ago

#ACL2026 #NLProc

0

1

0

82

Minsik Oh

@minsik_nlp

11 days ago

Excited to share that our paper has been accepted to ACL Main (Oral)! GitHub: https://t.co/jSJwv9Vt9N Embedding models coming soon on HuggingFace.

Minsik Oh

@minsik_nlp

about 3 years ago

New Preprint! TaDSE: Template-aware Dialogue Sentence Embeddings. How do you create Sentence Embeddings for dialogue systems that are semantically relevant? Are current sentence embeddings enough? We explore the questions with our paper. 🧵 Preprint: https://t.co/czSOnfwiqf

1

52

10

15

9K

2

11

0

2K

Who to follow

Jinheon Baek

@jinheonbaek

Ph.D. at @kaist_ai | Prev Intern at @Google @IBMResearch @MSFTResearch @Amazon | ML for knowledge, languages, and their intersections at scale.

Daniel Sungho Jung @ CVPR2026

@dqj5182

PhD in Artificial Intelligence @SeoulNatlUni • BS @PSUScience • Previously @Sony @kaistpr

KAIST AI

@KAIST_AI

The Kim Jaechul Graduate School of AI at KAIST

minsik_nlp retweeted

Diyi Yang

@Diyi_Yang

26 days ago

The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :) With all students in my cs329x Human-Centered LLM class, we present 60+ pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵

Diyi_Yang's tweet photo. The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :)

With all students in my cs329x Human-Centered LLM class, we present 60+ pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵

14

287

78

183

54K

Minsik Oh

@minsik_nlp

about 2 months ago

@sheriyuo I started using SGlang, works out of box

0

1

46

minsik_nlp retweeted

Julie Kallini ✨

@JulieKallini

2 months ago

A bit belated, but last quarter I had the privilege of serving as Head TA for CS224N, Stanford’s NLP course. It was such a joy to teach alongside @Diyi_Yang and @YejinChoinka, and to lead an incredible team of TAs. I wanted to share a few personal highlights!

JulieKallini's tweet photo. A bit belated, but last quarter I had the privilege of serving as Head TA for CS224N, Stanford’s NLP course.

It was such a joy to teach alongside @Diyi_Yang and @YejinChoinka, and to lead an incredible team of TAs. I wanted to share a few personal highlights! https://t.co/QLAVLbZAxF

1

118

10

33

26K

Minsik Oh

@minsik_nlp

2 months ago

@VeerarajuE @_shreya_s @jeremyberman @niloofar_mire @humansand This, excited to join if I can!

0

1

0

80

minsik_nlp retweeted

Jeremy Berman

@jeremyberman

3 months ago

RL can teach models new knowledge. A strong enough reasoning model working on hard enough problems will produce new abstractions mid-rollout, things it arrives at through deduction that it's never represented before. When you GRPO on those traces, that new knowledge gets distilled into the weights. So RL is doing two things: making the model better at reasoning, and teaching it new things.

7

145

11

116

19K

minsik_nlp retweeted

Niklas Muennighoff @Muennighoff

3 months ago

One gem from Composer paper is that RL improved both pass@k & pass@1. Suggests RL does not just reweigh existing capabilities but also teaches new ones? 💎

Muennighoff's tweet photo. One gem from Composer paper is that RL improved both pass@k & pass@1. Suggests RL does not just reweigh existing capabilities but also teaches new ones? 💎 https://t.co/7JNMI9kYka

16

328

22

182

62K

minsik_nlp retweeted

Diyi Yang

@Diyi_Yang

3 months ago

And huge thanks to our TAs who did the real work ❤️

2

119

5

14

18K

minsik_nlp retweeted

Diyi Yang

@Diyi_Yang

3 months ago

And a big shout out to to our sponsors for their generous support of CS224N 🙏

36

341

93

38

42K

minsik_nlp retweeted

Diyi Yang

@Diyi_Yang

3 months ago

Just had the CS224N final poster session. Lots of cool projects and great discussions 😊 Congrats to everyone for finishing strong 🥳

Diyi_Yang's tweet photo. Just had the CS224N final poster session. Lots of cool projects and great discussions 😊 Congrats to everyone for finishing strong 🥳 https://t.co/kBnchsqfGO

4

191

11

14

20K

minsik_nlp retweeted

Ruth Hook

@ruth_hook_

4 months ago

meanwhile in science

22

29K

3K

1K

333K

minsik_nlp retweeted

Robert Youssef

@rryssf

4 months ago

Stanford and Caltech researchers just published the first comprehensive taxonomy of how llms fail at reasoning not a list of cherry-picked gotchas. a 2-axis framework that finally lets you compare failure modes across tasks instead of treating each one as a random anecdote the findings are uncomfortable

rryssf's tweet photo. Stanford and Caltech researchers just published the first comprehensive taxonomy of how llms fail at reasoning

not a list of cherry-picked gotchas. a 2-axis framework that finally lets you compare failure modes across tasks instead of treating each one as a random anecdote

the findings are uncomfortable

39

2K

339

2K

124K

Minsik Oh

@minsik_nlp

4 months ago

@lateinteraction Very true, LLMs take memory too literally, and fail to prioritize the content.

0

1

0

38

minsik_nlp retweeted

Yuda Song @yus167

4 months ago

RL on LLMs inefficiently uses one scalar per rollout. But users regularly give much richer feedback: "make it formal," "step 3 is wrong." Can we train LLMs on this human-AI interaction? We introduce RL from Text Feedback, with 1) Self-Distillation; 2) Feedback Modeling (1/n) 🧵

yus167's tweet photo. RL on LLMs inefficiently uses one scalar per rollout. But users regularly give much richer feedback: "make it formal," "step 3 is wrong."

Can we train LLMs on this human-AI interaction?

We introduce RL from Text Feedback, with 1) Self-Distillation; 2) Feedback Modeling (1/n) 🧵 https://t.co/i8ncPFKq70

14

596

102

495

108K

minsik_nlp retweeted

Infini-AI-Lab

@InfiniAILab

4 months ago

RL is notoriously unstable under actor–policy mismatch 😥 — a common reality caused by kernel differences, MoE randomness, FP8 rollouts, or asynchronous pipelines. But here’s a crazy thought 🤔 👉 What if you could RL-train a large model using rollouts generated only by a weaker, faster, and completely different model? Sounds doomed from the start? 💩 We are releasing Jackpot 🎰.💡 enabling training Qwen3-8B-Base using only Qwen3-1.7B-Base generated rollouts ✨ Jackpot is surprisingly powerful: • Enables cheap, fast rollouts to train stronger models • Dramatically changes the cost–performance tradeoff of RL training We release Jackpot 🎰 in the following format: 🌔Paper: https://t.co/VV6088DDBS 🌕Code: https://t.co/OxLSjxeU3r 🌖Blog: https://t.co/0bR7C4XQqK [1/n]

InfiniAILab's tweet photo. RL is notoriously unstable under actor–policy mismatch 😥 — a common reality caused by kernel differences, MoE randomness, FP8 rollouts, or asynchronous pipelines.

But here’s a crazy thought 🤔
👉 What if you could RL-train a large model using rollouts generated only by a weaker, faster, and completely different model?
Sounds doomed from the start? 💩

We are releasing Jackpot 🎰.💡 enabling training Qwen3-8B-Base using only Qwen3-1.7B-Base generated rollouts

✨ Jackpot is surprisingly powerful:
• Enables cheap, fast rollouts to train stronger models
• Dramatically changes the cost–performance tradeoff of RL training

We release Jackpot 🎰 in the following format:
🌔Paper: https://t.co/VV6088DDBS
🌕Code: https://t.co/OxLSjxeU3r
🌖Blog: https://t.co/0bR7C4XQqK
[1/n]

6

127

23

111

27K

minsik_nlp retweeted

Yinjie Wang

@YinjieW2024

4 months ago

RL Anything! Your environment, reward model and policy can be improved in a closed-loop optimization. They provide feedback for each other to enhance the training signals and benefit the whole system. Check this out.

YinjieW2024's tweet photo. RL Anything! Your environment, reward model and policy can be improved in a closed-loop optimization. They provide feedback for each other to enhance the training signals and benefit the whole system. Check this out.

14

649

109

562

33K

minsik_nlp retweeted

Russ Salakhutdinov

@rsalakhu

4 months ago

New work on Learning to Reason on Hard Problems via Privileged On-Policy Exploration: https://t.co/DC4u1gJNvz Reinforcement learning (RL) has improved LLM reasoning, but state-of-the-art methods still fail on many hard tasks. On-policy RL rarely explores correct rollouts on difficult problems, yielding zero reward and no learning signal. We Introduce Privileged On-Policy Exploration (POPE), which uses human- or other oracle solutions as privileged guidance, not training targets, to steer exploration on hard problems. By prefixing oracle solution fragments, POPE enables non-zero rewards during guided rollouts for hard tasks, and the resulting behaviors transfer back to unguided problems. Empirically, POPE expands the set of solvable tasks and delivers substantial gains on challenging reasoning benchmarks. Check out an excellent thread by @QuYuxiao and this blogpost: https://t.co/TDyeK7TtAW with @QuYuxiao, @setlur_amrith, @gingsmith, @aviral_kumar2

1

181

27

142

20K

Minsik Oh

@minsik_nlp

5 months ago

@varunneal It's just prohibitively expensive to train more than once, this is true even a few years ago. Better to gather more data.

0

1

0

70

Minsik Oh

@minsik_nlp

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users