RL is painfully slow ๐ญ โ bottlenecked by super-long CoT rollout.
๐ญ Sparse attention should help, but naive sparse rollout hits a brutal efficiencyโstability tradeoff:
A tedious trial-and-error sparsity sweep for each dense policy is required before an actual RL run.
๐คSparrow chirps no more pain! Introduce Sparrow: Sparse Rollout for stable and efficient long-context RL.
Sparrow finds that:
๐กAs long as we keep the tail distribution mismatch throughout the sparse rollout above a critical threshold, the RL training will be stable.
๐กEven cooler! Through comprehensive control studies of Qwen3-1.7B, 4B, 8B thinking models RL with 40K rollout max length, the critical threshold stays constant across model sizes.
๐กSparrow then finds the optimal dynamic sparse schedule to reach the threshold with minimal cost.
๐กSparrow's findings are empirically validated to generalize in Qwen3-14B, and hold on both Math and Coding RL.
๐คSparrow empirically helps achieve 2.2ร / 2.4ร / 2.0ร rollout speedup on Qwen3 1.7B / 4B / 8B thinking models, while keeping training stability over extended RL steps.
We release the ๐คbird in the following formats.
[1/n]
Paper: https://t.co/oyzNoifgDT
Code: https://t.co/sAg4GFGMgD
Blog: https://t.co/7BDQlxAIRO
Sparrow ๐ค also motivates a simple add-on: distillsparse.
Goal: reach the same stability threshold with more aggressive sparsity โ higher rollout speedup.
๐ฅ In sparse-rollout dense-policy RL, we already compute both sparse generations and dense log-probs. This naturally enables on-policy distillation on sparse rollouts with little extra overhead.
๐ The key challenge: improve sparse rollout without contaminating the dense policy. So distillsparse uses a dedicated LoRA branch to store the sparse-rollout gradient delta.
๐ This LoRA distillation brings sparse rollout closer to the dense policy and reduces mismatch across multiple sparsity levels.
๐ Result: Sparrow can use more aggressive sparsity for higher rollout speedup while keeping RL stable, with only minor LoRA overhead. More details in the paper.
[6/n]
๐ AstraFlow v0.1.1 is out!
New in this release:
โข Dynamic recursive agents RL recipe
โข Megatron training backend
Inspired by dynamic agent workflows like @claudeai, this release reproduces the recursive agent RL recipe, where models learn to automatically spawn sub-agents to solve subtasks. (The implementation is based on the awesome Recursive Agent Optimization paper by @apurvasgandhi, @gneubig, @aviral_kumar2: https://t.co/e5kmh9Gpeh)
With existing RaaS support via @sgl_project and training backends including FSDP and Megatron, AstraFlow is moving toward a more flexible stack for large-scale agentic RL.
โญ Repo: https://t.co/0SLe2mIEsn
๐ Dynamic agents recipe: https://t.co/ekQfyhKTsx
[6/6] The big picture ๐
Sparse-attention research should be a loop an agent can run on its own. The hard part was never the idea โ it was turning math into fast, production-ready kernels. Vortex removes that wall.
When trying, an algorithm is as easy to describe as it is to implement; humans + AI agents can co-discover the next generation of efficient attention. ๐
Thanks to @chenzhuoming911, @XinruiZhongx, Qilong Feng, @RJ_Sadhukhan, @IronSteveZhou, @michaelqshieh, @JiaZhihao, @BeidiChen
Come build it with us ๐
๐ Introducing Vortex โ sparse attention designed by AI agents, efficient at scale.
๐ Same accuracy, way more throughput โ across every model we tried ๐
๐น GLM-4.7-Flash (MLA) โ 4.7ร faster
๐น MiniMax-M2.7 (229B) โ 1.37ร faster
๐น Qwen3-1.7B (agent-discovered!) โ 3.46ร faster
๐ค How? An agent writes a flow in a few lines of Python; Vortex compiles it into fused kernels in a real serving stack (SGLang) and benchmarks it end-to-end.
๐๏ธ The design: a Python frontend (vFlow) over a page-centric tensor abstraction (vTensor) + a serving-integrated backend.
๐ https://t.co/gZSPl7PXVp
๐ป https://t.co/awlislOZWw
๐ https://t.co/EBWbTObQbb
๐ https://t.co/apTWhIGD1M
[5/6] Does it scale? We went to 229B. ๐๏ธ
At this size โ MiniMax-M2.7 across 4ร B200 (TP=4) โ even *running* a sparse-attention experiment is basically impossible without Vortex.
With it: up to 1.37ร faster on AIME26, accuracy even nudging *above* full attention. Sparse attention still pays off at the frontier of model size. ๐ช
Align with how @cursor_ai has done its RL stage โ Astraflow is a new RL engine that enables asynchronous, heterogeneous, and geo-distributed RL in a native way through dataflow abstraction~
Like @FireworksAI_HQโs sparse RL transfer design, it syncs only โค1.1% of model weights โ making remote rollout lightweight and efficient.
Check it out!!!
After several months of work, ๐๐ฌ๐ญ๐ซ๐๐ ๐ฅ๐จ๐ฐ ๐ข๐ฌ ๐๐ข๐ง๐๐ฅ๐ฅ๐ฒ ๐จ๐ฎ๐ญ!
Try your own single-agent or multi-agent workflows with reinforcement learning on AstraFlow:
GitHub: https://t.co/yasDOyEkFl
Built on dataflow-oriented abstractions, AstraFlow cleanly separates rollout, dataflow, and trainer logic, making it easy to bring your own rollout service, training backend, or RL data algorithm.
AstraFlow natively supports:
1. โก Fully async multi-policy collaborative RL
2. ๐ Elastic, heterogeneous, cross-region rollouts
3. ๐ Substitutable rollout and trainer services
4. ๐งฉ Composable data algorithms
Open-sourcing AstraFlow is just the beginning. Weโll keep expanding the ecosystem with more agent workflows, rollout backends, trainer integrations, and RL data algorithms.
Weโre excited to release ๐๐ฌ๐ญ๐ซ๐๐ ๐ฅ๐จ๐ฐ, an open-source, dataflow-oriented RL system for training multi-agentic and multi-policy LLMs. ๐
Built for scalable, flexible, and efficient agent RL, AstraFlow natively enables:
โก ๐.๐ร ๐๐๐ฌ๐ญ๐๐ซ ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐ฉ๐จ๐ฅ๐ข๐๐ฒ ๐๐ ๐๐ง๐ญ๐ฌ ๐๐จ๐ฅ๐ฅ๐๐๐จ๐ซ๐๐ญ๐ข๐ฏ๐ ๐๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐
Achieves comparable or better accuracy than verl-based baseline.
๐ ๐๐๐ซ๐จ-๐๐จ๐๐ ๐ฌ๐ฒ๐ฌ๐ญ๐๐ฆ ๐๐ฅ๐๐ฑ๐ข๐๐ข๐ฅ๐ข๐ญ๐ฒ
Supports elastic multi-policy training and cross-region rollout across heterogeneous GPUs.
๐ฆ โค๐.๐% ๐ฌ๐ฉ๐๐ซ๐ฌ๐ ๐ญ๐ซ๐๐ง๐ฌ๐๐๐ซ ๐๐จ๐ซ ๐ซ๐๐ฆ๐จ๐ญ๐ ๐ซ๐จ๐ฅ๐ฅ๐จ๐ฎ๐ญ
Same to @FireworksAI_HQโs sparse RL transfer design, AstraFlow cuts sync from ~28 GB to ~1.5 GB, with deltas โค1.1% of weights, making remote rollout lightweight and efficient: https://t.co/YW4XWmA1Zz
๐ ๐๐ฎ๐๐ฌ๐ญ๐ข๐ญ๐ฎ๐ญ๐๐๐ฅ๐ ๐ซ๐จ๐ฅ๐ฅ๐จ๐ฎ๐ญ ๐๐ง๐ ๐ญ๐ซ๐๐ข๐ง๐๐ซ ๐ฌ๐๐ซ๐ฏ๐ข๐๐๐ฌ
Provides modular rollout and training components for flexible deployment.
๐งต(1/5)