Wenhao Yu @wyu_nd - Twitter Profile

16 days ago

@aleks_madry Really grateful for the chance to work with you at OpenAI. Wishing you all the best on your next adventure!

0

2

0

747

wyu_nd retweeted

Yucheng Shi

@Yucheng__Shi

22 days ago

What should AI generate in order to improve itself? Not just more questions, traces, or answers.  We believe it should learn to generate environments. Excited to share my first work after joining Tencent Hunyuan LLM. We study how models can construct reusable, verifiable environments that provide stable training signals for self-improvement. This is only a first feasibility step, but we see environment construction as a necessary path toward truly self-improving AI. Paper: https://t.co/bUO40DkKwz

17

200

39

158

69K

wyu_nd retweeted

OpenAI

@OpenAI

23 days ago

You've been asking for this one... Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.

2K

22K

3K

5K

5M

wyu_nd retweeted

OpenAI

@OpenAI

about 1 month ago

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

2K

52K

7K

9K

13M

Who to follow

Jie Huang

@jefffhj

Building intelligence @xAI. Grok-2🍍, 3🍫, 4🫐, Video Gen🪄. PhD from UIUC CS.

Weijia Shi

@WeijiaShi2

PhD student @uwnlp | Prev @allen_ai @MetaAI @CS_UCLA | 🏠 https://t.co/FoAww2gob3

Bill Yuchen Lin

@billyuchenlin

RL for coding @xAI @SpaceX Affiliate Assistant Prof @UW. Ex: @allen_ai; Google, Meta FAIR.

Wenhao Yu

@wyu_nd

about 2 months ago

@Han_Fang_ @ChengsongH31219 @wangxiaoyang1st @hongming110 Thank you Han, for sharing our work!

0

40

Wenhao Yu

@wyu_nd

3 months ago

📌Introducing Single–Multi Evolution Loop for self-improving LLMs Self-distillation is strong. Multi-LLM teachers are 𝐒𝐭𝐫𝐨𝐧𝐠𝐞𝐫! Iterate: collaborate → distill → repeat, then single + multi systems improve together. Paper: https://t.co/BMDYMnSnSa

Shangbin Feng @shangbinfeng

4 months ago

⚠️ Multi-LLM collaboration systems are costly? 💡 Distill the collaborative outputs back into a single model! ♻️ These post-distillation, improved LLMs can collaborate again, forming a multi-LLM collective evolution cycle. Introducing: ✨the single-multi evolution loop✨ https://t.co/fhftq8YwC5 Joint work w/ @kpb_in_acad @tsvetshop @wyu_nd

shangbinfeng's tweet photo. ⚠️ Multi-LLM collaboration systems are costly?

💡 Distill the collaborative outputs back into a single model!

♻️ These post-distillation, improved LLMs can collaborate again, forming a multi-LLM collective evolution cycle.

Introducing: ✨the single-multi evolution loop✨

https://t.co/fhftq8YwC5

Joint work w/ @kpb_in_acad @tsvetshop @wyu_nd

1

64

16

31

14K

1

83

13

55

10K

Wenhao Yu

@wyu_nd

4 months ago

🥳 Now accepted to @CVPR 2026! Project page and code: https://t.co/KYtodivyKQ

Yixin Wan @CVPR2026

@yixin_wan_

6 months ago

🤔🤔Tired of static, lifeless image edits? Not anymore! 🤗 🚀🚀We introduce MotionEdit, a framework supporting image editing that understands action, motion, interaction beyond static changes! 🤩🤩 🔗Full paper: https://t.co/lKKQX6DJPj ✨Project page: https://t.co/3XdkxVgM4b

5

343

63

260

58K

0

39

1

6

6K

wyu_nd retweeted

Yixin Wan @CVPR2026

@yixin_wan_

4 months ago

Update: We have released our Benchmark at https://t.co/Dx1h3hAcBw and Train dataset at https://t.co/UE5S5RwP7Z !

1

62

6

22

9K

wyu_nd retweeted

ChengSong Huang

@ChengsongH31219

4 months ago

R-Zero is accepted by #ICLR26 !!!

2

107

13

59

9K

wyu_nd retweeted

Zhenwen Liang @LiangZhenwen

6 months ago

For LLM RL, Entropy is not equal to exploration. Semantic diversity is also a lie. Most RL exploration methods push LLMs to "look" different, but they ignore how the model actually learns. We propose G²RL: Gradient-Guided Exploration for RL https://t.co/JGN6MGj6eb (1/n)

LiangZhenwen's tweet photo. For LLM RL, Entropy is not equal to exploration. Semantic diversity is also a lie.

Most RL exploration methods push LLMs to "look" different, but they ignore how the model actually learns.

We propose G²RL: Gradient-Guided Exploration for RL

https://t.co/JGN6MGj6eb

(1/n) https://t.co/f8NdfWmuF1

4

72

9

77

15K

Wenhao Yu

@wyu_nd

6 months ago

🚀Introducing 𝐆𝐫𝐚𝐝𝐢𝐞𝐧𝐭-𝐠𝐮𝐢𝐝𝐞𝐝 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 for RL (G²RL)! We know exploration is crucial in RL, but forcing entropy as a reward can inflate it in unhealthy ways. Our G²RL measures policy-intrinsic exploration from "gradients", avoiding mismatch from external comparators (classifiers, embeddings, etc.). 🌟Key insights: rollout trajectories are preferred when they meaningfully expand the policy’s own update directions, and discouraged when they contribute redundant or uninformative gradients. -- Easy to implement - -Negligible overhead -- Improves accuracy G²RL leads to healthier, more structured exploration Paper: https://t.co/SFzOX1HfHI

wyu_nd's tweet photo. 🚀Introducing 𝐆𝐫𝐚𝐝𝐢𝐞𝐧𝐭-𝐠𝐮𝐢𝐝𝐞𝐝 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 for RL (G²RL)!

We know exploration is crucial in RL, but forcing entropy as a reward can inflate it in unhealthy ways.

Our G²RL measures policy-intrinsic exploration from "gradients", avoiding mismatch from external comparators (classifiers, embeddings, etc.).

🌟Key insights: rollout trajectories are preferred when they meaningfully expand the policy’s own update directions, and discouraged when they contribute redundant or uninformative gradients.

-- Easy to implement
- -Negligible overhead
-- Improves accuracy

G²RL leads to healthier, more structured exploration

Paper: https://t.co/SFzOX1HfHI

4

247

42

170

15K

Wenhao Yu

@wyu_nd

6 months ago

This is commercial product–level quality!!

Yixin Wan @CVPR2026

@yixin_wan_

6 months ago

🤔🤔Tired of static, lifeless image edits? Not anymore! 🤗 🚀🚀We introduce MotionEdit, a framework supporting image editing that understands action, motion, interaction beyond static changes! 🤩🤩 🔗Full paper: https://t.co/lKKQX6DJPj ✨Project page: https://t.co/3XdkxVgM4b

5

343

63

260

58K

1

36

3

32

12K

Wenhao Yu

@wyu_nd

6 months ago

🔥 New Tencent x UCLA paper! Even SoTA image editing models (e.g., Nano Banana, GPT-Image-1) struggle with motion editing -- changing actions, poses, and interactions. We tackle this head-on, and do it better. 🧠✨ Paper: https://t.co/Vx5Dmv05KF

wyu_nd's tweet photo. 🔥 New Tencent x UCLA paper!
Even SoTA image editing models (e.g., Nano Banana, GPT-Image-1) struggle with motion editing -- changing actions, poses, and interactions.

We tackle this head-on, and do it better. 🧠✨
Paper: https://t.co/Vx5Dmv05KF https://t.co/3Bsy0f5hxC

3

106

12

56

8K

Wenhao Yu

@wyu_nd

6 months ago

📢 Happening Today at NeurIPS 2025! 🚀Come check out our spotlight paper, MMlongBench. We propose the first comprehensive benchmark (13k) to evaluate long-context VLM, and tested 46 models in the paper. 🖼️ Poster: #4507 ⏰ Time: 11am - 2pm

wyu_nd's tweet photo. 📢 Happening Today at NeurIPS 2025!
🚀Come check out our spotlight paper, MMlongBench. We propose the first comprehensive benchmark (13k) to evaluate long-context VLM, and tested 46 models in the paper.
🖼️ Poster: #4507
⏰ Time: 11am - 2pm https://t.co/xArCNNoofh

0

27

4

6

2K

wyu_nd retweeted

Rohan Paul

@rohanpaul_ai

6 months ago

Beautiful Tencent paper. Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data. Earlier self play systems let a model write and solve its own questions, but over time it drifts, repeats narrow patterns, and can even perform worse. Their method runs a challenger copy that generates questions and a solver copy that answers them, turning training into a question answer game between 2 agents. When the challenger writes, it sometimes sees a few real human question answer pairs, which pull its synthetic questions toward realistic tasks instead of strange, off topic puzzles. For each question, the solver tries several answers, the system estimates its success rate, and training keeps mainly mid difficulty questions where the solver is uncertain but not lost. Because both human and synthetic questions pass this filter, the solver trains on focused, non trivial problems, avoids cheap tricks like inflating question length, and gains stronger math and general reasoning scores than earlier self play methods. ---- Paper Link – arxiv. org/abs/2512.02472 Paper Title: "Guided Self-Evolving LLMs with Minimal Human Supervision"

rohanpaul_ai's tweet photo. Beautiful Tencent paper.

Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data.

Earlier self play systems let a model write and solve its own questions, but over time it drifts, repeats narrow patterns, and can even perform worse.

Their method runs a challenger copy that generates questions and a solver copy that answers them, turning training into a question answer game between 2 agents.

When the challenger writes, it sometimes sees a few real human question answer pairs, which pull its synthetic questions toward realistic tasks instead of strange, off topic puzzles.

For each question, the solver tries several answers, the system estimates its success rate, and training keeps mainly mid difficulty questions where the solver is uncertain but not lost.

Because both human and synthetic questions pass this filter, the solver trains on focused, non trivial problems, avoids cheap tricks like inflating question length, and gains stronger math and general reasoning scores than earlier self play methods.

----

Paper Link – arxiv. org/abs/2512.02472

Paper Title: "Guided Self-Evolving LLMs with Minimal Human Supervision"

20

351

72

258

18K

Wenhao Yu

@wyu_nd

6 months ago

@shangbinfeng You absolutely earned this!!

0

1

0

90

Wenhao Yu

@wyu_nd

6 months ago

In R-Few, The Challenger is incentivized to generate moderately (“medium”) uncertain questions that lie at the edge of the Solver’s current abilities; the Solver is rewarded for solving increasingly challenging tasks – sourced from both humans and the Challenger – via curriculum-based selection.

wyu_nd's tweet photo. In R-Few, The Challenger is incentivized to generate moderately (“medium”) uncertain questions that lie at the edge of the Solver’s current abilities; the Solver is rewarded for solving increasingly challenging tasks – sourced from both humans and the Challenger – via curriculum-based selection.

0

3

1

0

453

Wenhao Yu

@wyu_nd

6 months ago

📢New paper: Guided Self-Evolving LLMs with Minimal Human Supervision Self-evolving / Self-improving LLMs often plateau fast due to concept drift, diversity collapse, and mis-evolution. Our method fixes this — keeping self-evolution stable, aligned, and on track! Link: https://t.co/ptLW6Ls0ig

wyu_nd's tweet photo. 📢New paper: Guided Self-Evolving LLMs with Minimal Human Supervision

Self-evolving / Self-improving LLMs often plateau fast due to concept drift, diversity collapse, and mis-evolution.

Our method fixes this — keeping self-evolution stable, aligned, and on track!

Link: https://t.co/ptLW6Ls0ig

6

218

39

146

70K

Wenhao Yu

@wyu_nd

6 months ago

Domain-specific training boosts performance mostly within its own field; math transfers broadly, with strong math–physics and business–economics cross-domain connections.

wyu_nd's tweet photo. Domain-specific training boosts performance mostly within its own field; math transfers broadly, with strong math–physics and business–economics cross-domain connections. https://t.co/mJFBMDeHvW

1

0

550

Wenhao Yu

@wyu_nd

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users