Zengzhi Wang

@SinclairWang1

PhDing @sjtu1896 Working on Pre-training Data Engineering for Foundation Models: MathPile (2023), 🫐 ProX (2024), 💎 MegaMath (2025)，🐙 OctoThinker（2025）

Joined November 2020

2.9K Following

2.6K Followers

1.7K Posts

Pinned Tweet

Zengzhi Wang

@SinclairWang1

12 months ago

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps? (4) Is RL’s calm surface all thanks to Pre/Mid-training carrying the weight? Why does RL on LLaMA consistently underperform compared to Qwen? What makes a base model truly ready for RL scaling? What are the secrets under the hood? Due to the cost of training from scratch, we conduct extensive controlled experiments with 20B-token mid-training, systematically investigating what really matters for RL success. 💡Key insights: - High-quality math data is key to RL scaling. - QA data helps, but it depends on task similarity. - Instruction data boosts QA’s effectiveness. - More mid-training improves RL performance. Armed with these insights, we apply a two-stage (stable+decay) mid-training strategy on LLaMA, scaling up to 200B tokens—and RL performance on LLaMA now matches Qwen! To support this, we introduce MegaMath-Web-Pro-Max, a high-quality math-centric pretraining corpus. The dataset will be released soon on Hugging Face—stay tuned! 📦 https://t.co/2nKPTS3r7B Full construction details are in the paper, we hope it’s useful! https://t.co/z9uHAwVnJO Getting SOTA with a strong foundation is great 🤩, but understanding the foundation—the know-how—matters just as much. Hope this analysis inspires the community—and feel free to cite us if it helps! This work is impossible without all the brilliant co-authors @FaZhou_998 @xuefengli0301 @stefan_fee !!!

SinclairWang1's tweet photo. What Makes a Base Language Model Suitable for RL?

Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”:

(1) Is the magic only happening on Qwen + Math?
(2) Does the "aha moment" only spark during math reasoning?
(3) Is evaluation hiding some tricky traps?
(4) Is RL’s calm surface all thanks to Pre/Mid-training carrying the weight?

Why does RL on LLaMA consistently underperform compared to Qwen?
What makes a base model truly ready for RL scaling?
What are the secrets under the hood?

Due to the cost of training from scratch, we conduct extensive controlled experiments with 20B-token mid-training, systematically investigating what really matters for RL success.

💡Key insights:
- High-quality math data is key to RL scaling.
- QA data helps, but it depends on task similarity.
- Instruction data boosts QA’s effectiveness.
- More mid-training improves RL performance.

Armed with these insights, we apply a two-stage (stable+decay) mid-training strategy on LLaMA, scaling up to 200B tokens—and RL performance on LLaMA now matches Qwen!

To support this, we introduce MegaMath-Web-Pro-Max, a high-quality math-centric pretraining corpus. The dataset will be released soon on Hugging Face—stay tuned! 📦 https://t.co/2nKPTS3r7B
Full construction details are in the paper, we hope it’s useful! https://t.co/z9uHAwVnJO

Getting SOTA with a strong foundation is great 🤩,
but understanding the foundation—the know-how—matters just as much.
Hope this analysis inspires the community—and feel free to cite us if it helps!

This work is impossible without all the brilliant co-authors @FaZhou_998 @xuefengli0301 @stefan_fee !!!

515

472

93K

SinclairWang1 retweeted

Gabriele Berton

@gabriberton

14 days ago

Cool paper by AllenAI and CMU on how to pretrain LLMs for long-context understanding The setup is legit and the findings are interesting, and they show that most LLMs have architectural issues 🧵

gabriberton's tweet photo. Cool paper by AllenAI and CMU on how to pretrain LLMs for long-context understanding

The setup is legit and the findings are interesting, and they show that most LLMs have architectural issues 🧵 https://t.co/wAIODIlo27

240

244

39K

Zengzhi Wang

@SinclairWang1

14 days ago

True

Susan Zhang

@suchenzang

20 days ago

there is no better time in tech than now to be a jack of all trades, master of a few. just make sure to keep adding to the few year over year, such that the cumulative breadth of expertise you collect becomes an increasingly rare combo. remember, if you're top 10% in 3 different areas, that already makes you top 0.1%. keep switching it up until you get to "your best", and then switch it up again (great for a particular flavor of people who don't enjoy resting on laurels, maybe not so great for others). question all institutional value and pedigrees, all traditional career paths or corporate ladders: the college industrial complex is getting shaken up, alongside a disappearing managerial class, so if you're pursuing either make sure you are fully internally aligned with why. social/political capital in a particular institution can feel incredible, but if you're spending all your energy on complex political people games, you're not a technologist anymore, you're an unelected politician. if you're ok with that, then all's well. critical thinking is more important than ever: take nothing at face-value, question everything and everyone. the equivalent of ai slop can be found in humans operating under misaligned incentives and interests. the sooner you're clued into disambiguating the talkers/larpers from the doers, the better off you'll be figuring out where and who to invest your time in. the anxiety of job displacement is very real, since a surprising amount of white collar work/prestige is built on a performative house of cards, significantly lacking in correlation with technical breadth, depth, and skill. as long as you keep learning, keep building, keep producing receipts, you will be fine. if all that sounds ok to you, welcome to the world of technology! it's truly one of the few places you can experience child-like wonder every few years, and be constantly humbled & excited by new adventures, as scary as they may seem at first. don't give up, drink your water, get your sunlight, and take breaks as needed. tech careers are notoriously nonlinear, so you might as well embrace it and enjoy the ride!

328

397K

198

SinclairWang1 retweeted

Steven Feng

@stevenyfeng

18 days ago

CS25 brings top researchers every week to break down the frontier of Transformers + beyond. 🎥 Recordings drop later on YouTube: https://t.co/F3kcbc8AzE 📊 Discussion on Discord (6000+ members): https://t.co/vlDVm30x5F 🌐 More info: https://t.co/Yvq4AcLiBV @_KaranPS_ @agihouse_org @MongoDB @modal @StanfordOnline @stanfordaiclub @StanfordAILab @StanfordHAI @stanfordnlp (5/6)

734

Who to follow

Li Dong

@donglixp

Researcher at Microsoft Research

CLS

@ChengleiSi

Founding member @Recursive_SI & PhD @stanfordnlp | teaching language models to do research

Shangbin Feng

@shangbinfeng

PhD student @uwcse @uwnlp. Model collaboration, for compositional intelligence and collaborative development. #水文学家

SinclairWang1 retweeted

Leandro von Werra

@lvwerra

17 days ago

We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro. The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models. While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well. Interestingly, the exception we found that GPT 5.5 Pro actually didn't benefit from the physics-intern harness! Read more about it here: https://t.co/x74RAYCt5B PS: I think the Harness[Model] notation is kind of nice.

589

545

96K

Zengzhi Wang

@SinclairWang1

27 days ago

@ManTle_Ma 👍

SinclairWang1 retweeted

elie

@eliebakouch

about 1 month ago

this is fascinating, they train an encoder/decoder but use LLM matching the target model's shape for each part, so the latent space is just plain language and they can detect reward hacking, unwanted behavior and more could even see it being used as an eval to quantify how smart a model is, i love this

eliebakouch's tweet photo. this is fascinating, they train an encoder/decoder but use LLM matching the target model's shape for each part, so the latent space is just plain language and they can detect reward hacking, unwanted behavior and more

could even see it being used as an eval to quantify how smart a model is, i love this

109

873

111K

SinclairWang1 retweeted

Tomek Korbak

@tomekkorbak

about 1 month ago

OpenAI introduces an additional layer of defense against misaligned or confused coding agents, complementing chain of thought monitoring we use internally. When Codex wants to execute a risky action outside of its sandbox, a separate Codex agent is asked to approve or deny it.

tomekkorbak's tweet photo. OpenAI introduces an additional layer of defense against misaligned or confused coding agents, complementing chain of thought monitoring we use internally. When Codex wants to execute a risky action outside of its sandbox, a separate Codex agent is asked to approve or deny it. https://t.co/B40TPq8adC

177

20K

SinclairWang1 retweeted

elie

@eliebakouch

about 2 months ago

this is really cool work by the meta, with a new phase that does a better cold start for RL reasoning heavy tasks

132

12K

Zengzhi Wang

@SinclairWang1

about 2 months ago

It's time to change my work philosophy after witnessing the remarkable productivity of the frontier flagship model. Past: I needed to prepare many useful tools for myself to improve efficiency in my workspace. Now, I need to create a seamless workspace for AI, enabling it to interact with various tools, engines, and access permissions. It could boost my efficiency by at least 2 to 5 times. More importantly, the saved time allows me to think in a global perspective, waiving many manual labour and having a coffee by the way. While I recognize that this change might be a bit late, I’m glad to know it’s still not too late to adapt. 🚀🚀🚀

144

SinclairWang1 retweeted

himanshu

@himanshustwts

2 months ago

The Arcee AI Podcast is here! In this episode, @latkins and @stochasticchasm join us to discuss the story of Trinity models and everything frontier. I can say, this talk has been one of the most amazing and technical conversations we've had on Ground Zero. 0:00:00 - Intro 0:00:59 - Varun's transition from SWE to Pre-Training Lead 0:04:20 - Trinity Manifesto, Openclaw Ecosystem 0:12:15 - Arcee's Post-Training to Pre-Training Pivot 0:23:45 - Varun's first Pre-Training run (you can just do things!) 0:27:33 - Saturation in Pre-Training?, Mid-Training 0:37:00 - Tweaking the Training Architecture, Adam vs Muon, Evals 01:09:07 - Inference Engineering, Quick Fire, Post-Training Recipe 01:18:02 - Alpha in RL Envs, Harness Design 01:23:00 - American Open Source is trailing Chinese Competitors, Trinity Adoption 01:29:25 - Hiring at Arcee, Advice to 20yo

197

107

32K

SinclairWang1 retweeted

Pengfei Liu

@stefan_fee

2 months ago

Aha, thank you for the kind words! We’re exploring what “frontier lab” means in academia—through democratizing cognition and embracing “less is more” & “simple is powerful”. Recent releases: - agentic intelligence: davinci-dev, davinci-agency, davinci-env - open foundation model: davinci-llm, davinci-magihuman - data efficiency: (lima) limo, limr, limi - benchmark: agencybench, researcherbench, innovatorbench ... - data darwinism PartI, Part II - interaction as Intelligence: Part I, Part II - engineering: prompt engineering, cognition engineering, context engineering 2.0 More at: https://t.co/x4Nw4qohCX Our North Star: Using AI technology to make life better for people around us. Would love to exchange ideas if any of these interest you!

SinclairWang1 retweeted

马东锡 NLP

@dongxi_nlp

2 months ago

CC + autoreseach 来发现新的越狱算法。越来越多的类似的研究工作完全可以自动化。 autoresearch 重塑学术，正在发生。

371

345

105K

SinclairWang1 retweeted

He He

@hhexiy

3 months ago

https://t.co/H3TAsaThYQ

873

128

118K

Zengzhi Wang

@SinclairWang1

2 months ago

@gregd_nlp @beirmug @COLM_conf agree. As a reviewer, I found that the papers assigned to me are often highly matched.

SinclairWang1 retweeted

Pengfei Liu

@stefan_fee

3 months ago

Seedance 2.0 is impressive. But it's closed-source! Introducing our daVinci-MagiHuman — a single-stream 15B Transformer trained from scratch that jointly generates video + audio. No cross-attention. No multi-stream branches. Just self-attention. ⚡ 5s 1080p video in 38s on a single H100 🏆 80% win rate vs Ovi 1.1 | 60.9% vs LTX 2.3 (2,000 human comparisons) 🌍 6 languages 📦 Fully open-source Speed by simplicity. By @SII_GAIR × @SandAI_HQ 📄 https://t.co/SgFOunlEIj 💻 https://t.co/9rwNWzlMKN 🤗 https://t.co/txduP5FgIC