Zhepei Wei @weizhepei - Twitter Profile

Pinned Tweet

about 1 month ago

😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃https://t.co/fGODWWIjR1 🧵[1/n]

weizhepei's tweet photo. 😢RLVR is powerful but expensive
🤯Imagine using <20% RLVR training while achieving 100% performance?

Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost!

📃https://t.co/fGODWWIjR1
🧵[1/n] https://t.co/pfnnjK3xxd

7

249

45

217

49K

Zhepei Wei

@weizhepei

7 days ago

@yumeng0818 @CapitalOne Many thanks, Yu!😃

0

1

0

123

Zhepei Wei

@weizhepei

7 days ago

🎉 Honored to receive the @CapitalOne PhD Fellowship! Many thanks to my advisor @yumeng0818 and my collaborators for their guidance and support throughout my PhD journey at @CS_UVA @UVAEngineers! 💙🧡 Excited to continue building more capable, reliable, and efficient AI systems! https://t.co/LJzPFsFiz5

2

29

4

1

5K

Zhepei Wei

@weizhepei

29 days ago

@TimXu222575 Thanks for sharing, Shuyao!😃

0

1

0

67

Who to follow

Zengzhi Wang

@SinclairWang1

PhDing @sjtu1896 Working on Pre-training Data Engineering for Foundation Models: MathPile (2023), 🫐 ProX (2024), 💎 MegaMath (2025)，🐙 OctoThinker（2025）

Yuke Wang

@YukeWang1

Assistant Professor at Rice CS | CS Ph.D. at UCSB | Deep Learning System | ex- Amazon, Microsoft Research, NVIDIA Research | NVIDIA Graduate Fellowship’22.

Tianbao Xie

@TianbaoX

Ph.D. candidate @XLangNLP lab and @hkunlp2020 .Advised by @taoyds and @ikekong . 🤝 @Alibaba_Qwen @SFResearch

Zhepei Wei

@weizhepei

29 days ago

@_ueaj @tianhongzxy @WeiLin__Chen @ChengsongH31219 @jiaxinhuang0229 @yumeng0818 @kalomaze Indeed a very interesting idea! Thanks for sharing!

1

2

0

68

Zhepei Wei

@weizhepei

about 1 month ago

😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃https://t.co/fGODWWIjR1 🧵[1/n]

7

249

45

217

49K

Zhepei Wei

@weizhepei

29 days ago

@tanghyyy @yuanzhi_zhu @tianhongzxy @WeiLin__Chen @ChengsongH31219 @jiaxinhuang0229 @yumeng0818 Thanks for bringing the great works to our attention! They have been cited and discussed in our paper: https://t.co/rGtOpQqB2Y

0

2

0

70

Zhepei Wei

@weizhepei

29 days ago

@_ueaj @tianhongzxy @WeiLin__Chen @ChengsongH31219 @jiaxinhuang0229 @yumeng0818 @kalomaze wait, any context here? did i join the party too late? 😅

1

0

96

Zhepei Wei

@weizhepei

29 days ago

The paper and accompanying artifacts are now released — including 500+ RLVR checkpoints for studying training dynamics and extrapolation! 🥳🥳 📚 Paper: https://t.co/olkSYHFAHb 📝 Blog: https://t.co/H9xWxD6dlZ 💻 Code: https://t.co/0ZF1WBlfAr 🤗 Checkpoints: https://t.co/Uj4OrbpoQl

Zhepei Wei

@weizhepei

about 1 month ago

😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃https://t.co/fGODWWIjR1 🧵[1/n]

7

249

45

217

49K

2

87

17

57

7K

weizhepei retweeted

Jinyuan Li @JinYuan99121

about 1 month ago

Can process reward models know when NOT to trust themselves? 🤔 We introduce BetaPRM: a distributional PRM that predicts both step-level success probability and the reliability of that prediction. Instead of only asking “how good is this step?”, BetaPRM also asks: “how confident am I?” 🔍

JinYuan99121's tweet photo. Can process reward models know when NOT to trust themselves? 🤔

We introduce BetaPRM: a distributional PRM that predicts both step-level success probability and the reliability of that prediction.

Instead of only asking “how good is this step?”, BetaPRM also asks:
“how confident am I?” 🔍

2

37

14

20

6K

Zhepei Wei

@weizhepei

about 1 month ago

@yuanzhi_zhu @tianhongzxy @WeiLin__Chen @ChengsongH31219 @jiaxinhuang0229 @yumeng0818 This is exactly one of our baselines, and we’ve also outlined the key differences in our paper—which will be released soon together with other artifacts. Stay tuned!😉

0

1

0

123

Zhepei Wei

@weizhepei

about 1 month ago

Not yet. The current evaluation focuses on math tasks since the RLVR training domain is solely math. It would definitely be interesting to see whether the extrapolated checkpoints generalize better (or regress less) than fully RLVR-tuned models on other domains. What non-math tasks would you suggest we evaluate on?😃

0

115

Zhepei Wei

@weizhepei

about 1 month ago

Kudos to our amazing collaborators: Xinyu @tianhongzxy , Wei-Lin @WeiLin__Chen , Chengsong @ChengsongH31219 , Jiaxin @jiaxinhuang0229 , and Yu @yumeng0818 🎉🎉🥳

0

4

0

1

478

Zhepei Wei

@weizhepei

about 1 month ago

📢 Takeaway: You only need minimal RLVR training to know where the model is heading. Observe the early training dynamics, then go extrapolate future checkpoints at no training cost! Blogpost👇 https://t.co/UWUJ1CB8tR 🧵[10/n]

1

9

1

2

546

Zhepei Wei

@weizhepei

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users