Zirui Wu @WilliamZR7 - Twitter Profile

Pinned Tweet

about 5 hours ago

Introducing DreamReasoner-8B🚀 — an open-source block diffusion model for math and code reasoning. It reaches reasoning performance comparable to Qwen3-8B-Thinking while enabling parallel block-wise generation.

4

97

20

71

7K

Zirui Wu @WilliamZR7

about 1 hour ago

@rchmielarz Thanks for the question. By on par, we are referring to accuracy on reasoning benchmarks. Please check our technical report for further analysis on inference speed. https://t.co/eyn5N645Ar.

0

1

0

20

Zirui Wu @WilliamZR7

about 5 hours ago

Introducing DreamReasoner-8B🚀 — an open-source block diffusion model for math and code reasoning. It reaches reasoning performance comparable to Qwen3-8B-Thinking while enabling parallel block-wise generation.

4

97

20

71

7K

Zirui Wu @WilliamZR7

about 1 hour ago

@Shubham09632806 Thanks for sharing. The idea of using on-policy distillation is amazing.

0

22

Zirui Wu @WilliamZR7

about 5 hours ago

Huge thanks to the team: @linzhengisme, @JiachengYe15, @sansa19739319, @xlzhao_hku, Yangsong Feng, Wei Bi and @ikekong

0

3

0

110

Zirui Wu @WilliamZR7

about 5 hours ago

We’re releasing DreamReasoner-8B to support research on diffusion-based reasoning models. Checkpoint: https://t.co/c8Xa8tm2UT SGLang Inference: https://t.co/8Nt9kfmsE5 Code: https://t.co/VRecksesWv

1

8

1

6

149

WilliamZR7 retweeted

chang ma

@ma_chang_nlp

about 16 hours ago

Excited to introduce 🌠Orion: Towards Lab Automation with Computer-Using Agents. Give it control of your lab computer💻, and it can use software, analyze any experiment images, browse databases on Chrome exactly like you, and work for hours to analyze your experiments. 🌎:https://t.co/5EAe8vEetl 📎:https://t.co/D08hYrkJuG

ma_chang_nlp's tweet photo. Excited to introduce 🌠Orion: Towards Lab Automation with Computer-Using Agents.

Give it control of your lab computer💻, and it can use software, analyze any experiment images, browse databases on Chrome exactly like you, and work for hours to analyze your experiments.

🌎:https://t.co/5EAe8vEetl
📎:https://t.co/D08hYrkJuG

1

72

31

41

12K

WilliamZR7 retweeted

Xiaomi MiMo

@XiaomiMiMo

10 days ago

🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive：https://t.co/MX0kjHKdKi Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: [email protected]

XiaomiMiMo's tweet photo. 🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀

We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME!

Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE.

Read the full technical deep dive：https://t.co/MX0kjHKdKi

Want to experience the future of real-time AI?
👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk
⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT)
💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now.
⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience.
🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com

151

2K

294

844

383K

WilliamZR7 retweeted

Yonggan Fu

@YongganFu

29 days ago

🚀 Check out our Nemotron-Labs-Diffusion model! We have been wondering about the true promise of diffusion LMs, especially when competing with strong AR models in terms of accuracy and with MTP methods in terms of efficiency. 💡 Nemotron-Labs-Diffusion is an important step toward answering this question, delivering a tri-mode LM that unifies AR, diffusion, and self-speculation (diffusion drafts, AR verifies) decoding within a single model. 🌟 Core Insights 🔸 AR and diffusion objectives can be mutually beneficial and harmonized within a single model. 🔸 Self-speculation, enabled by joint AR/diffusion training, can outperform Eagle3 in throughput. 🔸 Diffusion shows strong long-term potential for parallel decoding under an optimal sampler. 📊 Model performance: Our 8B/14B models match or exceed Qwen3-8B/14B accuracy while generating 6× more tokens per forward pass, leading to a 4× speedup in SGLang on GB200. 🤗 𝗛𝗙 𝗖𝗼𝗹𝗹𝗲𝗰𝘁𝗶𝗼𝗻 (3B/8B/14B base + instruct and 8B VLM): https://t.co/UJ8qLM7OBB 📰 𝗧𝗲𝗰𝗵 𝗥𝗲𝗽𝗼𝗿𝘁: https://t.co/kXEXU20VAk

3

120

19

51

185K

WilliamZR7 retweeted

Zyphra

@ZyphraAI

about 1 month ago

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

ZyphraAI's tweet photo. We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD.

Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference.

We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵 https://t.co/xMXp4sFYkb

22

690

85

250

1M

WilliamZR7 retweeted

Jiayi Weng

@Trinkle23897

about 1 month ago

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm. https://t.co/1ZaIneleuW

64

1K

235

1K

3M

WilliamZR7 retweeted

Lei Li

@_TobiasLee

about 1 month ago

🦞 Claw-Eval-Live is out, a live extension of the Claw-Eval Family! This live release includes: 105 tasks | 17 workflow families | 13 frontier models tested | quarterly refresh from real ClawHub marketplace signals. Instead of relying on a static task set, Claw-Eval-Live keeps agent evaluation aligned with evolving real-world enterprise workflows. Check it out: 🤗 HF Paper: https://t.co/jKLlpTyLEL Leaderboard: https://t.co/lWVGhak47l Code: https://t.co/n70zwnTLsn

_TobiasLee's tweet photo. 🦞 Claw-Eval-Live is out, a live extension of the Claw-Eval Family!

This live release includes:
105 tasks | 17 workflow families | 13 frontier models tested | quarterly refresh from real ClawHub marketplace signals.

Instead of relying on a static task set, Claw-Eval-Live keeps agent evaluation aligned with evolving real-world enterprise workflows.

Check it out:
🤗 HF Paper: https://t.co/jKLlpTyLEL
Leaderboard: https://t.co/lWVGhak47l
Code: https://t.co/n70zwnTLsn

2

25

5

2K

WilliamZR7 retweeted

David Samuel

@davidsamuelcz

about 2 months ago

Happy to present our #ICLR2026 paper: Dual-objective LMs! How can we make autoregressive LLMs more robust to overfitting and masked-diffusion models more sample-efficient? Simply by training on both objectives at the same time!

2

44

8

37

4K

WilliamZR7 retweeted

Kimi.ai @Kimi_Moonshot

about 2 months ago

Meet Kimi K2.6 Agent Swarm 👋 Highlights： 🔹 Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from 100 / 1,500 in K2.5). 🔹 Outputs are real files, not chat - one run delivers 100+ files, 100,000-word literature reviews, or 20,000-row datasets. 🔹Heterogeneous skills - search, analysis, coding, long-form writing, and visual generation all running in parallel 🔗Try it at: https://t.co/2Tu8McUaUa

104

4K

323

2K

610K

WilliamZR7 retweeted

Liran Ringel

@liranringel

2 months ago

Introducing DDTree: accelerates speculative decoding by drafting a tree with one block diffusion pass, then verifying multiple likely continuations together. Paper: https://t.co/cgYBw70O5i Project page: https://t.co/ygFukxrZLB Code: https://t.co/2z7U00NsuH

30

984

135

973

105K

WilliamZR7 retweeted

Lei Li

@_TobiasLee

2 months ago

Claw-Eval v1.1 is out, with multimodal tasks and multi-turn dialogue. Now we have: 300 human-verified tasks | 2,159 rubrics | 9 categories | 14 models from 7 families tested. Agents are graded on Completion, Safety, and Robustness through full-trajectory auditing. Shoutout to Qwen @Alibaba_Qwen , GLM @Zai_org , and MiniMax @MiniMax_AI for integrating Claw-Eval into their model evaluations! Paper: https://t.co/wtSqPyep50 Leaderboard: https://t.co/FilOv3qC7P Code: https://t.co/7DHsMkP0PQ 🤗Data: https://t.co/yV0pAvhH3r 🧵 Here are our findings:

_TobiasLee's tweet photo. Claw-Eval v1.1 is out, with multimodal tasks and multi-turn dialogue.

Now we have:
300 human-verified tasks | 2,159 rubrics | 9 categories | 14 models from 7 families tested.

Agents are graded on Completion, Safety, and Robustness through full-trajectory auditing.

Shoutout to Qwen @Alibaba_Qwen , GLM @Zai_org , and MiniMax @MiniMax_AI for integrating Claw-Eval into their model evaluations!

Paper: https://t.co/wtSqPyep50
Leaderboard: https://t.co/FilOv3qC7P
Code: https://t.co/7DHsMkP0PQ
🤗Data: https://t.co/yV0pAvhH3r

🧵 Here are our findings:

1

49

9

12

13K

WilliamZR7 retweeted

Haotian Ye

@haotian_yeee

3 months ago

Finally getting to share one of my favorite projects. ICLR Oral! 🏆 It’s so strange how rigid video tokenization is. Think about it: why should a still landscape cost the same amount of tokens as a busy street? We built InfoTok. We went back to basics with Shannon’s information theory to make tokens "adaptive" in a principled way. Its 2.3x better compression and 11x faster inference demonstrates the magic of the old-school theory ✨ Check it out: https://t.co/0PeYtaVY1y

10

294

42

168

49K

Zirui Wu

@WilliamZR7

Last Seen Users on Sotwe

Trends for you

Most Popular Users