Zhoujun (Jorge) Cheng @ChengZhoujun - Twitter Profile

Pinned Tweet

5 months ago

Pretraining has scaling laws to guide compute allocation. But for RL on LLMs, we lack a practical guide on how to spend compute wisely. We show the optimal compute allocation in LLM RL scales predictably. ↓ Key takeaways below

18

442

98

361

71K

Zhoujun (Jorge) Cheng

@ChengZhoujun

10 days ago

Very solid data work here on cua envs. Massive value in both the outcome and the execution!

Bowen Wang

@BowenWangNLP

10 days ago

RLVR has become the recipe for agentic post-training. But for Computer-Use Agents, the bottleneck is not the algorithm, it is the data. 🐌 🚀 We introduce CUA-Gym: a scalable, lightweight synthesis engine that turns arbitrary task queries into verifiable RLVR data for computer-use agents. The largest open CUA RLVR dataset to date: 🎯 32,122 verifiable RLVR tasks with programmatic setup scripts + rewards 🌐 110 environments: 16 desktop apps + 94 synthesized mock web apps 🏆 Qwen3.5-based CUA models trained with GSPO reach 72.6% on OSWorld-Verified and 56.6% on WebArena 📄 Paper: https://t.co/cdvHJPzgb1 🏠 Homepage: https://t.co/kvhaOQxNx7 🤗 Dataset: https://t.co/w5vOIRdchR 💻 Codebase: https://t.co/CcRlNTlS1c 🧩 Environments: https://t.co/fNZ6YAI8LD 🧵[1/6]

18

505

94

562

97K

1

21

1

7

2K

ChengZhoujun retweeted

Benhao Huang

@huskydogewoof

14 days ago

🌀 Introducing 𝐄𝐪𝐮𝐢𝐥𝐢𝐛𝐫𝐢𝐮𝐦 𝐑𝐞𝐚𝐬𝐨𝐧𝐞𝐫𝐬 (𝐄𝐪𝐑) ! Feedforward models and weight-tied models behave very differently on hard reasoning generalization. EqR pushes this difference to the extreme by learning 𝐭𝐚𝐬𝐤-𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧𝐞𝐝 𝐧𝐞𝐮𝐫𝐚𝐥 𝐚𝐭𝐭𝐫𝐚𝐜𝐭𝐨𝐫𝐬 . • Sudoku-Extreme: 99.8% • Maze: 93% #ICML2026

13

305

69

222

75K

ChengZhoujun retweeted

Zora Wang @ZhiruoW

17 days ago

Excited to announce our tutorial: "Future of Work in the Age of LLMs" at #ACL2026 in San Diego, July 2! 🌴 There's a lot of speculation about AI and the future of human work. Our tutorial unpacks it from four angles: → The landscape of human work → How to build LLMs to augment real-world workflows → How to evaluate these LLMs → The future of work with LLMs/LLM-based agents

ZhiruoW's tweet photo. Excited to announce our tutorial: "Future of Work in the Age of LLMs" at #ACL2026 in San Diego, July 2! 🌴
There's a lot of speculation about AI and the future of human work. Our tutorial unpacks it from four angles:
→ The landscape of human work
→ How to build LLMs to augment real-world workflows
→ How to evaluate these LLMs
→ The future of work with LLMs/LLM-based agents

2

137

19

55

17K

Who to follow

Zekun Wang (ZenMoore) 🔥

@ZenMoore1

#LLM #MLLM #GenAI Researcher @Kling_ai

Qian Liu

@sivil_taram

coding @xai 🇸🇬, previously tiktok and sea ai lab. opinions are my own

Yiheng Shu

@YihengShu

PhD student @osunlp | Intern @Google | Previously @NanjingUnivers1 | Former Intern @MSFTResearch @Intuit

Zhoujun (Jorge) Cheng

@ChengZhoujun

17 days ago

Thanks RadixArk for sharing our work NanoRollout!!

RadixArk

@radixark

17 days ago

Slow, heavy environments have been the real bottleneck for agentic RL. NanoRollout tackles it head-on with a clean rollout-as-a-service design, integrated with miles for scalable agent RL. Great work from the team！

1

69

8

31

15K

0

17

1

2

2K

ChengZhoujun retweeted

RadixArk

@radixark

17 days ago

Slow, heavy environments have been the real bottleneck for agentic RL. NanoRollout tackles it head-on with a clean rollout-as-a-service design, integrated with miles for scalable agent RL. Great work from the team！

1

69

8

31

15K

ChengZhoujun retweeted

Junli Wang

@JunliWang2021

20 days ago

Thrilled to see those promising numbers! 🤯 Same finding on our end with NanoRollout: cross-scaffold generalization basically doesn't happen out of the box -- something the field should be talking about more.

JunliWang2021's tweet photo. Thrilled to see those promising numbers! 🤯

Same finding on our end with NanoRollout: cross-scaffold generalization basically doesn't happen out of the box -- something the field should be talking about more. https://t.co/calHsMScfG

1

32

6

16

6K

ChengZhoujun retweeted

Huaizheng Zhang

@zhzHNN

21 days ago

Cool. I always enjoy playing with nano projects. No matter who asks me how to learn LLM, my answer is always the same. - Start with nanochat/nanogpt. - Then pick one super niche direction. - Deep dive into it. Build a nano project. Scale it gradually. That's all.

0

37

8

16

4K

ChengZhoujun retweeted

Dinghuai Zhang 张鼎怀

@zdhnarsil

20 days ago

Been working on the coding model behind this for a while. Still needs huge improvement, but let's see!

18

217

7

10K

ChengZhoujun retweeted

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

22 days ago

The lack of light weight, open agent infra has been a massive pain point. This is a great starting point esp for large scale, thousands of parallel envs, battle tested coding / computer use agent training!

0

19

2

4

3K

ChengZhoujun retweeted

Hao Zhang

@haozhangml

21 days ago

check our new work nanorollout!!

0

29

5

4

8K

Zhoujun (Jorge) Cheng

@ChengZhoujun

21 days ago

@Ber18791531 @JunliWang2021 🤗🤗🤗

0

80

ChengZhoujun retweeted

Shibo Hao

@Ber18791531

21 days ago

Check out this cool project led by @JunliWang2021 @ChengZhoujun ! Solid new agent infra and impressive results on open source RL/SFT recipes.

2

11

4

0

1K

ChengZhoujun retweeted

Zihan "Zenus" Wang

@wzenus

21 days ago

Very solid agentic infra work on accelerating agent rollout!

1

10

2

4

3K

ChengZhoujun retweeted

Bowen Wang

@BowenWangNLP

21 days ago

Nice work! Training digital agents isn't trivial, co-designing rollouts with targeted environments stands as the pain point once you dig into agentic RL. This is super clean and agentic RL folks should try this out.

1

7

2

0

741

ChengZhoujun retweeted

Guohao Li 🐫

@guohao_li

21 days ago

We need more agent rollouts. Glad to see SETA is used in NanoRollout!

1

20

3

3K

Zhoujun (Jorge) Cheng

@ChengZhoujun

21 days ago

@tw_killian @BYU @BYUCS Congrats Taylor!

1

2

0

104

Zhoujun (Jorge) Cheng

@ChengZhoujun

22 days ago

Happy to release NanoRollout, our infra attempt to scale digital agent rollouts without pain. Setting up and scaling parallel digital agent envs is one of the biggest headaches in agent training / deployment. The open community hasn't handled it well. Two appealing features from NanoRollout: 🔌 Non-intrusive RL integration with frameworks such as miles, verl, tunix; validated end-to-end, e.g. outperforms DeepSWE-32B at a large 4k batch size 🚀 🧩 Unified support across agent harnesses and envs — covering SWE-Bench, Terminal-Bench, OSWorld, CocoaBench — with fast parallel eval that reproduces published scores (e.g., full SWE-Bench Verified eval from 102 min → 18 min, 5.7x faster⚡) And the core logic is just ~900 LOC. Hope NanoRollout helps if you're also trying to scale agent rollouts. Check out the tech blog and github for more details! Big thanks to the fantastic co-lead @JunliWang2021

Junli Wang

@JunliWang2021

22 days ago

Digital agent learning needs massive rollouts. But digital agent rollouts are painfully slow due to heavy environments. 🐌 🚀 We introduce NanoRollout, a lightweight open infra (900 lines core code) for digital agent rollout at scale, validated with three workloads: 🏋️ Large batchsize (4K) SWE Agent RL -> surpasses DeepSWE-32B 🧪 250k+ distilled coding trajectories -> SOTA ≤32B open coding agent ⚡ Fast evaluation on coding/cua/unified agent -> finish Check our Blog: https://t.co/IBNqqbLqra

2

136

39

102

35K

0

19

7

5

3K

ChengZhoujun retweeted

Junli Wang

@JunliWang2021

22 days ago

Digital agent learning needs massive rollouts. But digital agent rollouts are painfully slow due to heavy environments. 🐌 🚀 We introduce NanoRollout, a lightweight open infra (900 lines core code) for digital agent rollout at scale, validated with three workloads: 🏋️ Large batchsize (4K) SWE Agent RL -> surpasses DeepSWE-32B 🧪 250k+ distilled coding trajectories -> SOTA ≤32B open coding agent ⚡ Fast evaluation on coding/cua/unified agent -> finish Check our Blog: https://t.co/IBNqqbLqra

2

136

39

102

35K

ChengZhoujun retweeted

Caiming Xiong

@CaimingXiong

23 days ago

Today, we’re excited to launch Recursive (@recursive_si): an exceptional team across London and San Francisco, building AI systems that can safely improve their own capabilities over time.

15

123

17

16

17K

Zhoujun (Jorge) Cheng

@ChengZhoujun

23 days ago

A great piece on self-distillation using failures! Besides scaling up num of rollouts, actively scaling (extracting) signals from raw rollouts should be an important way to improve agents and save compute.

Yuwei Zhang

@YuweiZh49446108

23 days ago

On-policy self-distillation is a promising direction for learning from rich textual feedback. But can it really learn from failed trajectories? Our answer: not quite -- unless we let the model actively interpret them. 🧵1/N

YuweiZh49446108's tweet photo. On-policy self-distillation is a promising direction for learning from rich textual feedback. But can it really learn from failed trajectories?

Our answer: not quite -- unless we let the model actively interpret them.

🧵1/N https://t.co/Vtq3RcDfxu

10

476

59

303

519K

1

18

2

18

4K

Zhoujun (Jorge) Cheng

@ChengZhoujun

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users