Infini-AI-Lab

@InfiniAILab

Pittsburgh, PA

Joined September 2024

40 Following

2K Followers

136 Posts

Infini-AI-Lab

@InfiniAILab

5 days ago

We thank @IronSteveZhou, @RJ_Sadhukhan, Zhaofeng Sun, @chenzhuoming911, Souvik Kundu, Saket Dingliwal, Sai Muralidhar Jayanthi, Aram Galstyan, @haizhong_zheng, and @BeidiChen for their contribution to the work.

325

Infini-AI-Lab

@InfiniAILab

5 days ago

RL is painfully slow 😭 — bottlenecked by super-long CoT rollout. 🔭 Sparse attention should help, but naive sparse rollout hits a brutal efficiency–stability tradeoff: A tedious trial-and-error sparsity sweep for each dense policy is required before an actual RL run. 🐤Sparrow chirps no more pain! Introduce Sparrow: Sparse Rollout for stable and efficient long-context RL. Sparrow finds that: 💡As long as we keep the tail distribution mismatch throughout the sparse rollout above a critical threshold, the RL training will be stable. 💡Even cooler! Through comprehensive control studies of Qwen3-1.7B, 4B, 8B thinking models RL with 40K rollout max length, the critical threshold stays constant across model sizes. 💡Sparrow then finds the optimal dynamic sparse schedule to reach the threshold with minimal cost. 💡Sparrow's findings are empirically validated to generalize in Qwen3-14B, and hold on both Math and Coding RL. 🐤Sparrow empirically helps achieve 2.2× / 2.4× / 2.0× rollout speedup on Qwen3 1.7B / 4B / 8B thinking models, while keeping training stability over extended RL steps. We release the 🐤bird in the following formats. [1/n] Paper: https://t.co/oyzNoifgDT Code: https://t.co/sAg4GFGMgD Blog: https://t.co/7BDQlxAIRO

204

220

76K

Infini-AI-Lab

@InfiniAILab

5 days ago

Sparrow 🐤 also motivates a simple add-on: distillsparse. Goal: reach the same stability threshold with more aggressive sparsity → higher rollout speedup. 🥝 In sparse-rollout dense-policy RL, we already compute both sparse generations and dense log-probs. This naturally enables on-policy distillation on sparse rollouts with little extra overhead. 🍉 The key challenge: improve sparse rollout without contaminating the dense policy. So distillsparse uses a dedicated LoRA branch to store the sparse-rollout gradient delta. 🍐 This LoRA distillation brings sparse rollout closer to the dense policy and reduces mismatch across multiple sparsity levels. 🍒 Result: Sparrow can use more aggressive sparsity for higher rollout speedup while keeping RL stable, with only minor LoRA overhead. More details in the paper. [6/n]

InfiniAILab's tweet photo. Sparrow 🐤 also motivates a simple add-on: distillsparse.

Goal: reach the same stability threshold with more aggressive sparsity → higher rollout speedup.

🥝 In sparse-rollout dense-policy RL, we already compute both sparse generations and dense log-probs. This naturally enables on-policy distillation on sparse rollouts with little extra overhead.
🍉 The key challenge: improve sparse rollout without contaminating the dense policy. So distillsparse uses a dedicated LoRA branch to store the sparse-rollout gradient delta.
🍐 This LoRA distillation brings sparse rollout closer to the dense policy and reduces mismatch across multiple sparsity levels.
🍒 Result: Sparrow can use more aggressive sparsity for higher rollout speedup while keeping RL stable, with only minor LoRA overhead. More details in the paper.
[6/n]

359

Infini-AI-Lab

@InfiniAILab

10 days ago

🚀 AstraFlow v0.1.1 is out! New in this release: • Dynamic recursive agents RL recipe • Megatron training backend Inspired by dynamic agent workflows like @claudeai, this release reproduces the recursive agent RL recipe, where models learn to automatically spawn sub-agents to solve subtasks. (The implementation is based on the awesome Recursive Agent Optimization paper by @apurvasgandhi, @gneubig, @aviral_kumar2: https://t.co/e5kmh9Gpeh) With existing RaaS support via @sgl_project and training backends including FSDP and Megatron, AstraFlow is moving toward a more flexible stack for large-scale agentic RL. ⭐ Repo: https://t.co/0SLe2mIEsn 📖 Dynamic agents recipe: https://t.co/ekQfyhKTsx

10K

Infini-AI-Lab

@InfiniAILab

10 days ago

[6/6] The big picture 👇 Sparse-attention research should be a loop an agent can run on its own. The hard part was never the idea — it was turning math into fast, production-ready kernels. Vortex removes that wall. When trying, an algorithm is as easy to describe as it is to implement; humans + AI agents can co-discover the next generation of efficient attention. 🌀 Thanks to @chenzhuoming911, @XinruiZhongx, Qilong Feng, @RJ_Sadhukhan, @IronSteveZhou, @michaelqshieh, @JiaZhihao, @BeidiChen Come build it with us 👇

InfiniAILab's tweet photo. [6/6] The big picture 👇

Sparse-attention research should be a loop an agent can run on its own. The hard part was never the idea — it was turning math into fast, production-ready kernels. Vortex removes that wall.

When trying, an algorithm is as easy to describe as it is to implement; humans + AI agents can co-discover the next generation of efficient attention. 🌀

Thanks to @chenzhuoming911, @XinruiZhongx, Qilong Feng, @RJ_Sadhukhan, @IronSteveZhou, @michaelqshieh, @JiaZhihao, @BeidiChen
Come build it with us 👇

386

Infini-AI-Lab

@InfiniAILab

10 days ago

🌀 Introducing Vortex — sparse attention designed by AI agents, efficient at scale. 📈 Same accuracy, way more throughput — across every model we tried 👇 🔹 GLM-4.7-Flash (MLA) → 4.7× faster 🔹 MiniMax-M2.7 (229B) → 1.37× faster 🔹 Qwen3-1.7B (agent-discovered!) → 3.46× faster 🤖 How? An agent writes a flow in a few lines of Python; Vortex compiles it into fused kernels in a real serving stack (SGLang) and benchmarks it end-to-end. 🏗️ The design: a Python frontend (vFlow) over a page-centric tensor abstraction (vTensor) + a serving-integrated backend. 📄 https://t.co/gZSPl7PXVp 💻 https://t.co/awlislOZWw 🌐 https://t.co/EBWbTObQbb 📚 https://t.co/apTWhIGD1M

InfiniAILab's tweet photo. 🌀 Introducing Vortex — sparse attention designed by AI agents, efficient at scale.

📈 Same accuracy, way more throughput — across every model we tried 👇
🔹 GLM-4.7-Flash (MLA) → 4.7× faster
🔹 MiniMax-M2.7 (229B) → 1.37× faster
🔹 Qwen3-1.7B (agent-discovered!) → 3.46× faster

🤖 How? An agent writes a flow in a few lines of Python; Vortex compiles it into fused kernels in a real serving stack (SGLang) and benchmarks it end-to-end.

🏗️ The design: a Python frontend (vFlow) over a page-centric tensor abstraction (vTensor) + a serving-integrated backend.

📄 https://t.co/gZSPl7PXVp
💻 https://t.co/awlislOZWw
🌐 https://t.co/EBWbTObQbb
📚 https://t.co/apTWhIGD1M

61K

Infini-AI-Lab

@InfiniAILab

10 days ago

[5/6] Does it scale? We went to 229B. 🏔️ At this size — MiniMax-M2.7 across 4× B200 (TP=4) — even *running* a sparse-attention experiment is basically impossible without Vortex. With it: up to 1.37× faster on AIME26, accuracy even nudging *above* full attention. Sparse attention still pays off at the frontier of model size. 💪

InfiniAILab's tweet photo. [5/6] Does it scale? We went to 229B. 🏔️

At this size — MiniMax-M2.7 across 4× B200 (TP=4) — even *running* a sparse-attention experiment is basically impossible without Vortex.

With it: up to 1.37× faster on AIME26, accuracy even nudging *above* full attention. Sparse attention still pays off at the frontier of model size. 💪

368

InfiniAILab retweeted

Beidi Chen

@BeidiChen

26 days ago

Align with how @cursor_ai has done its RL stage — Astraflow is a new RL engine that enables asynchronous, heterogeneous, and geo-distributed RL in a native way through dataflow abstraction~ Like @FireworksAI_HQ’s sparse RL transfer design, it syncs only ≤1.1% of model weights — making remote rollout lightweight and efficient. Check it out!!!

210

146

34K

InfiniAILab retweeted

Haizhong Zheng

@haizhong_zheng

27 days ago

After several months of work, 𝐀𝐬𝐭𝐫𝐚𝐅𝐥𝐨𝐰 𝐢𝐬 𝐟𝐢𝐧𝐚𝐥𝐥𝐲 𝐨𝐮𝐭! Try your own single-agent or multi-agent workflows with reinforcement learning on AstraFlow: GitHub: https://t.co/yasDOyEkFl Built on dataflow-oriented abstractions, AstraFlow cleanly separates rollout, dataflow, and trainer logic, making it easy to bring your own rollout service, training backend, or RL data algorithm. AstraFlow natively supports: 1. ⚡ Fully async multi-policy collaborative RL 2. 🌍 Elastic, heterogeneous, cross-region rollouts 3. 🔄 Substitutable rollout and trainer services 4. 🧩 Composable data algorithms Open-sourcing AstraFlow is just the beginning. We’ll keep expanding the ecosystem with more agent workflows, rollout backends, trainer integrations, and RL data algorithms.

Infini-AI-Lab

@InfiniAILab

27 days ago

Joint work with @haizhong_zheng, Yizhuo Di, Jiahui Wang, @shuoweijin, @Xenshinu429, @libertyeagle8, @MorleyMao1, @istoica05, @jiawzhao, @BeidiChen

567

Infini-AI-Lab

@InfiniAILab

27 days ago

We’re excited to release 𝐀𝐬𝐭𝐫𝐚𝐅𝐥𝐨𝐰, an open-source, dataflow-oriented RL system for training multi-agentic and multi-policy LLMs. 🚀 Built for scalable, flexible, and efficient agent RL, AstraFlow natively enables: ⚡ 𝟐.𝟕× 𝐟𝐚𝐬𝐭𝐞𝐫 𝐦𝐮𝐥𝐭𝐢-𝐩𝐨𝐥𝐢𝐜𝐲 𝐚𝐠𝐞𝐧𝐭𝐬 𝐜𝐨𝐥𝐥𝐚𝐛𝐨𝐫𝐚𝐭𝐢𝐯𝐞 𝐑𝐋 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 Achieves comparable or better accuracy than verl-based baseline. 🌍 𝐙𝐞𝐫𝐨-𝐜𝐨𝐝𝐞 𝐬𝐲𝐬𝐭𝐞𝐦 𝐟𝐥𝐞𝐱𝐢𝐛𝐢𝐥𝐢𝐭𝐲 Supports elastic multi-policy training and cross-region rollout across heterogeneous GPUs. 📦 ≤𝟏.𝟏% 𝐬𝐩𝐚𝐫𝐬𝐞 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫 𝐟𝐨𝐫 𝐫𝐞𝐦𝐨𝐭𝐞 𝐫𝐨𝐥𝐥𝐨𝐮𝐭 Same to @FireworksAI_HQ’s sparse RL transfer design, AstraFlow cuts sync from ~28 GB to ~1.5 GB, with deltas ≤1.1% of weights, making remote rollout lightweight and efficient: https://t.co/YW4XWmA1Zz 🔁 𝐒𝐮𝐛𝐬𝐭𝐢𝐭𝐮𝐭𝐚𝐛𝐥𝐞 𝐫𝐨𝐥𝐥𝐨𝐮𝐭 𝐚𝐧𝐝 𝐭𝐫𝐚𝐢𝐧𝐞𝐫 𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 Provides modular rollout and training components for flexible deployment. 🧵(1/5)

41K

Infini-AI-Lab

@InfiniAILab

27 days ago

Explore AstraFlow today: Paper: https://t.co/YjLm3HD14W Blog: https://t.co/y4HQrXMQFl Code: https://t.co/UKXtcxFN5l 🧵(5/5)

706

Infini-AI-Lab

@InfiniAILab

Last Seen Users on Sotwe

Trends for you

Most Popular Users