Sangwoong Yoon

20 days ago

1/ 🚀 New work: GDSD Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models RL for diffusion LLMs based on approximate likelihood faces a key problem: training–inference mismatch (TIM). We propose GDSD to address it by reformulating RL as denoiser self-distillation. 🌐https://t.co/CTkjVtohAp

xiaohang_tang's tweet photo. 1/ 🚀 New work: GDSD

Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

RL for diffusion LLMs based on approximate likelihood faces a key problem:

training–inference mismatch (TIM).

We propose GDSD to address it by reformulating RL as denoiser self-distillation.

🌐https://t.co/CTkjVtohAp

1

23

11

8

4K

about 2 months ago

@shyam91019594 @ilijabogunovic @AurelienLucchi @Zhiyong16403503 @hbouammar Much respect to you!!!

1

2

0

58

WoongSSang retweeted

PhD student @CMU_ECE | ex-intern @nvidia | Generative models

2 months ago

Hiring 2 summer ML research interns at the University of Basel 🇨🇭. Research topics: RL/diffusion LLM post-training, reasoning, or LLM orchestration. Possible fully funded PhD offers to follow. I'll be at ICLR this week and happy to chat. Apply: https://t.co/gOdLRv771L

12

370

34

272

23K

Who to follow

Sangyun Lee

@sang_yun_lee

Christoph Reich

@ChristophR1996

@ELLISforEurope Ph.D. Student @tumcvg, @visinf & @Oxford_VGG | https://t.co/yaqdM5V4IW. from @etitdarmstadt & https://t.co/VHpc5R9Fs1. from @CS_TUDarmstadt | Prev. @NECLabsAmerica & @koeppl_lab

Junyoung Seo

@jyseo_cv

Ph.D. Student @KAIST_AI, working on visual generative models. RS Intern @NVIDIAAI Ex-Intern @Meta, @SonyAI_global. Collaborated with NAVER AI

WoongSSang retweeted

4 months ago

𝐰𝐝𝟏 is accepted to ICLR 2026 🎉 See you in Brazil! Link: https://t.co/rlCj2tt4Ke Updates: 🧩Sudoku: 76.4 (+60 vs d1) @ 12K steps 🚀wd1++ (full FT): 84.5 GSM8K / 44.2 MATH500 with only 20 RL steps! 🧠Theory: connects weighted policy optimization ↔ energy-guided diffusion

1

24

8

11

4K

WoongSSang retweeted

4 months ago

Watching the Moltbot hype and thinking about our paper: We showed a single deceptive LLM in a Mixture of Agents can nullify all the gains from collaboration. Venetians spent centuries building layered councils to elect their Doge and still got corrupted from within. Same vibes. Robustness of multi-agent AI is wide open! https://t.co/lA5ksthvYx

ilijabogunovic's tweet photo. Watching the Moltbot hype and thinking about our paper: We showed a single deceptive LLM in a Mixture of Agents can nullify all the gains from collaboration.

Venetians spent centuries building layered councils to elect their Doge and still got corrupted from within. Same vibes. Robustness of multi-agent AI is wide open!

https://t.co/lA5ksthvYx

3

34

12

15

3K

Shyam Sundhar Ramesh @shyam91019594

4 months ago

@shyam91019594 Great work! Your drive and persistence were amazing. Huge respect!

1

0

104

WoongSSang retweeted

4 months ago

🚨 New paper alert 🚨 Excited to share our latest work: “Multi-Task GRPO: Reliable LLM Reasoning Across Tasks” 📄 https://t.co/VbKlqOSam4 RL improves reasoning — but often breaks multi-task reliability. We identify two failure modes — and propose MT-GRPO to fix them 🔥 🧵[1/N]👇

6

101

24

78

13K

WoongSSang retweeted

8 months ago

🚀 We are hiring! Fully funded PhD positions @ Rhine-AI Group (University of Basel). Focusing on RL for LLMs, diffusion-based reasoning, and agentic AI. Please RT! Deadline approaching: December 1, 2025. Don't forget to apply! Apply: https://t.co/JN9tia0Kkl

3

29

13

32

19K

WoongSSang retweeted

A. Hamdi Guzel

@ahguzelUK

9 months ago

🎮 How can agents learn to generalize from limited offline data? We introduce iMac (Imagined Autocurricula) - training agents entirely in world models with emergent curricula!

1

75

19

62

15K

WoongSSang retweeted

9 months ago

Our new Rhine-AI lab is officially open at the University of Basel! We're currently recruiting for multiple PhD positions. If you're interested, you can register your interest on our new website, application links will be available soon: https://t.co/Nq5vazMxll

ilijabogunovic's tweet photo. Our new Rhine-AI lab is officially open at the University of Basel!

We're currently recruiting for multiple PhD positions. If you're interested, you can register your interest on our new website, application links will be available soon: https://t.co/Nq5vazMxll https://t.co/MILGXx7Owv

2

31

6

3

4K

William Bankes @bankes_william

10 months ago

@JCJesseLai Amazing! I also got the seminar announcement email, but am so sorry that I cannot make it as I am in London :( hope you have a great time in Korea!

1

0

93

WoongSSang retweeted

11 months ago

Super excited that the work I completed as part of a team at @LASRlabs won 1 of 2 Outstanding Paper Awards at the @ActInterp workshop at ICML 2025. Massive thanks to @Arrrlex for presenting our work! 📖Check out the paper here: https://t.co/9R6H4EgaMC

0

9

6

0

1K

11 months ago

I'm excited to share that I will be joining the Graduate School of Artificial Intelligence at UNIST as an Assistant Professor starting in 2026! I will continue working on the intersection between generative modeling and reinforcement learning.

6

36

0

2

2K

WoongSSang retweeted

11 months ago

wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models https://t.co/PECJ6XYYII

0

12

1

4

1K

WoongSSang retweeted

11 months ago

🧶1/ Diffusion-based LLMs (dLLMs) are fast & promising—but hard to fine-tune with RL. Why? Because their likelihoods are intractable, making common RL (like GRPO) inefficient & biased. 💡We present a novel method 𝐰𝐝𝟏, that mitigates these headaches. Let’s break it down.👇

4

50

10

31

9K

WoongSSang retweeted

about 1 year ago

Glad to introduce our new work "Game-Theoretic Regularized Self-Play Alignment of Large Language Models". https://t.co/6cLCrHwQfA 🎉 We introduce RSPO, a general, provably convergent framework to bring different regularization strategies into self-play alignment. 🧵👇

xiaohang_tang's tweet photo. Glad to introduce our new work "Game-Theoretic Regularized Self-Play Alignment of Large Language Models". https://t.co/6cLCrHwQfA 🎉

We introduce RSPO, a general, provably convergent framework to bring different regularization strategies into self-play alignment. 🧵👇 https://t.co/lzvSRBClIC

2

32

13

12

4K