Yi Dong @doyend - Twitter Profile

doyend retweeted

8 days ago

Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. Find a problem, design the harness, and train your own agents! 🧵

billxbf's tweet photo. Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change.

Find a problem, design the harness, and train your own agents! 🧵

25

896

144

943

128K

doyend retweeted

Hao Zhang

@HaoZhang3438830

2 months ago

Excited to introduce ProRL Agent: Rollout-as-a-Service for RL training of multi-turn LLM agents! 🚀 As we move toward complex agentic tasks, rollout infrastructure is often a bottleneck. We’re decoupling I/O-heavy rollouts from GPU training via a unified HTTP API. Why ProRL Agent? Decoupled & Scalable: Treats rollout as a service, allowing near-linear throughput scaling. System-Level Optimization: Includes load balancing and automated sandbox cleanup for high stability. Integrated: Now part of NVIDIA NeMo Gym to help researchers scale RL pipelines faster. The Results 📈 On SWE-bench-Verified, we saw significant gains: +8.4 on Qwen3-8B +8.2 on Qwen3-14B Proven success across STEM, Math, and General Coding agents. Check out the research and open-source code: 📄 Paper: https://t.co/l4wR6SbJ7m💻 Repo: https://t.co/5otcyzkDKe Huge thanks to the team and NVIDIA for the support! 👏

HaoZhang3438830's tweet photo. Excited to introduce ProRL Agent: Rollout-as-a-Service for RL training of multi-turn LLM agents! 🚀

As we move toward complex agentic tasks, rollout infrastructure is often a bottleneck. We’re decoupling I/O-heavy rollouts from GPU training via a unified HTTP API.

Why ProRL Agent?
Decoupled & Scalable: Treats rollout as a service, allowing near-linear throughput scaling.

System-Level Optimization: Includes load balancing and automated sandbox cleanup for high stability.

Integrated: Now part of NVIDIA NeMo Gym to help researchers scale RL pipelines faster.

The Results 📈
On SWE-bench-Verified, we saw significant gains:

+8.4 on Qwen3-8B

+8.2 on Qwen3-14B

Proven success across STEM, Math, and General Coding agents.

Check out the research and open-source code: 📄 Paper: https://t.co/l4wR6SbJ7m💻 Repo: https://t.co/5otcyzkDKe
Huge thanks to the team and NVIDIA for the support! 👏

4

139

21

89

29K

doyend retweeted

Ximing Lu

@GXiming

3 months ago

We’re open-sourcing the data and model behind Golden Goose 🦢✨. Check them out and see how we turn unverifiable internet text 🌐 into large-scale RLVR tasks 😎. 📊 GooseReason-0.7M: https://t.co/xBu9KC5Q9F 🤖 GooseReason-4B-Instruct: https://t.co/iT2ViXGbqM

3

262

34

223

34K

doyend retweeted

Ximing Lu

@GXiming

4 months ago

There’s growing excitement around scaling up RLVR to get continuous gains with more compute. But in practice, improvements saturate on finite training data. 😱 Introducing Golden Goose 🦢✨, a simple trick to synthesize unlimited RLVR tasks 😎 from unverifiable internet text. 🌐

GXiming's tweet photo. There’s growing excitement around scaling up RLVR to get continuous gains with more compute. But in practice, improvements saturate on finite training data. 😱

Introducing Golden Goose 🦢✨, a simple trick to synthesize unlimited RLVR tasks 😎 from unverifiable internet text. 🌐

13

395

66

314

109K

Who to follow

Rick Bowman

@richardbowman

CPTO @bankrate, former CPO / COO @bellsant, formerly Head of Investing Platform @capitalone, CTO / CEO @shiftgig, CSO Morningstar, SVP Engineering @hellowallet

doyend retweeted

Zhilin Wang @wangzhilin123

7 months ago

You asked and we listened The @nvidia ProfBench leaderboard 🏆 is here on @huggingface : https://t.co/W9PE6rbzfq One design we have for the leaderboard is that we distinguish open-weight vs closed-source models and reasoning vs instruct model. Separately, we also show the cost of running the entire benchmark (thanks to @openrouter for putting prices in one place) because real world users absolutely care about prices. Putting this together with @viviennezhangx, we were surprised to find that open-weight models can sometimes perform at a similar level to closed-source models but at cents on the dollar. 🤑 Thanks @ClementDelangue @imohitmayank for the amazing suggestion! What models do you want to see on there next? Comment below and I’ll run it (nothing crazy though) #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC

0

6

3

0

2K

doyend retweeted

Zhilin Wang @wangzhilin123

7 months ago

We built ProfBench to raise the bar for LLMs - literally. At @NVIDIA, we worked with domain experts to create a benchmark that goes far beyond trivia and short answers. ProfBench tests LLMs on complex, multi-step tasks that demand the kind of reasoning, synthesis, and clarity you'd expect from a PhD physicist or MBA consultant. 🌎 This isn’t just a dataset drop. It’s a global collaboration: 38 professionals across 8 countries contributed over 7,000 expert-written rubrics across finance MBA 💵, consulting MBA 📊, chemistry PhD 🧪and physics PhD 🚀. 🧗Every prompt and grading rubric was handcrafted, requiring tens of hours of dedicated and focussed work. Now fully supported in the NeMo Evaluator SDK, ProfBench enables reproducible, rubric-based evaluations and side-by-side model comparisons. 🔗 ProfBench on @HuggingFace https://t.co/wmOyvLY6e7 🔗 NeMo Evaluator SDK https://t.co/JgFJklQqPr I’m so proud of the team that made this happen. Let’s keep pushing what AI can do. Work done with @jaehunjung_com @GXiming @shizhediao Ellie Evans @jiaqizengggggg @PavloMolchanov @YejinChoinka @jankautz @doyend #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC

3

83

15

72

52K

doyend retweeted

Shizhe Diao @shizhediao

8 months ago

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts. 👇 (1/n)

shizhediao's tweet photo. 🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration

When step-scaling hits a plateau, scale rollouts, not steps.
BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts.
👇 (1/n) https://t.co/HYEGMnOQGy

18

201

44

97

44K

doyend retweeted

Shizhe Diao @shizhediao

about 1 year ago

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering new insights into the debate.

shizhediao's tweet photo. Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough!

Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering new insights into the debate.

19

421

68

376

80K

doyend retweeted

Oliver Stanley

@_OliverStanley

about 1 year ago

Introducing Reasoning Gym: Over 100 procedurally generated reasoning environments for evaluation and RLVR of language models. Generate virtually infinite training or evaluation data with fine-grained difficulty control and automatic verifiers. 🧵 1/

_OliverStanley's tweet photo. Introducing Reasoning Gym: Over 100 procedurally generated reasoning environments for evaluation and RLVR of language models. Generate virtually infinite training or evaluation data with fine-grained difficulty control and automatic verifiers. 🧵 1/ https://t.co/ecZdmTHXR1

3

274

42

202

45K

Yi Dong @doyend

about 3 years ago

https://t.co/O2KUobOcD9

0

28

doyend retweeted

Jousef Murad @Jousefm2

over 3 years ago

⚡2D to Simulate 3D: Made that legendary Rubik's Cube even easier to Understand ⚡ The legendary Rubik's Cube made even easier to understand

35

5K

977

996

0

Yi Dong @doyend

almost 4 years ago

I really like the muTransfer paper(https://t.co/4Wfxw4Duvd). To help me understand the paper better, I wrote a blog to derive some of the missing equations in the paper. https://t.co/lwsaMOCmkI Thank you @TheGregYang for the wonderful theoretical work!

0

Yi Dong @doyend

over 5 years ago

Good explanation of ReBel paper

Noam Brown

@polynoamial

over 5 years ago

I just watched this video and was super impressed by how well @ykilcher communicated the essence of our paper. If you want to understand why AlphaZero can't play poker and why ReBeL can, this is a great video to watch!

1

185

26

37

0

1

0

doyend retweeted

RAPIDS AI @RAPIDSai

over 6 years ago

Learn how to achieve a 100x speedup using @numba_jit and @rapidsai for efficient and fast fractional differencing computation on #GPUs. https://t.co/YKYjp4RbED

0

18

14

3

0

Yi Dong @doyend

almost 7 years ago

Happy to take any questions for this work

RAPIDS AI @RAPIDSai

almost 7 years ago

Learn how you can achieve up to 20x speedup in your Quant workflow by leveraging #gQuant, a set of finance examples built on RAPIDS. https://t.co/6iIjaKlqhd

RAPIDSai's tweet photo. Learn how you can achieve up to 20x speedup in your Quant workflow by leveraging #gQuant, a set of finance examples built on RAPIDS. https://t.co/6iIjaKlqhd https://t.co/itss0iPypa

0

23

10

1

0

1

0

doyend retweeted

NVIDIA AI Developer

@NVIDIAAIDev

almost 7 years ago

To help researchers and data scientists in #finance accelerate their workflows with @rapidsai, we've published a new technical post highlighting a few #gQuant finance examples demonstrating the value of GPU accelerated #datascience: https://t.co/InYES8WmJ6