Minmin Sun @minminsun2019 - Twitter Profile

MinminSun2019 retweeted

28 days ago

The speed-of-light optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs. In the PyTorch Foundation's latest community blog post, you can learn all about the complete design, implementation, and optimization of Qwen3.5 models in the TokenSpeed inference framework and see for yourself how this work is improving performance 👉 https://t.co/Qr1PTIhqok This achievement was a joint effort between the @Alibaba_Qwen inference team, @lightseekorg Foundation TokenSpeed team, @NVIDIAAI , and the Mooncake team, with special contributions from @tri_dao for FlashAttention-4 (FA4) optimization. @KVCache_AI

PyTorch's tweet photo. The speed-of-light optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs.

In the PyTorch Foundation's latest community blog post, you can learn all about the complete design, implementation, and optimization of Qwen3.5 models in the TokenSpeed inference framework and see for yourself how this work is improving performance 👉 https://t.co/Qr1PTIhqok

This achievement was a joint effort between the @Alibaba_Qwen inference team, @lightseekorg Foundation TokenSpeed team, @NVIDIAAI , and the Mooncake team, with special contributions from @tri_dao for FlashAttention-4 (FA4) optimization. @KVCache_AI

12

289

49

158

280K

Minmin Sun @MinminSun2019

28 days ago

Big congrats to the TokenSpeed team & Qwen Inference team! 🙌 This is just chapter one. We’ll keep co-engineering to unlock speed-of-light inference for every Qwen model.

Qwen

@Alibaba_Qwen

28 days ago

Fast, faster, Qwen. 🚀 Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners. Huge thanks to @lightseekorg, @NVIDIAAI, the Mooncake team, and @tri_dao for the pioneering FA4 optimization. Together, we are pushing the boundaries of open-source LLM inference. 🤝✨ Dive into the full @PyTorch blog post below! 👇 https://t.co/p04wookcZj #Qwen #Qwen3_5 #TokenSpeed #LLM #Inference #AI #PyTorch #OpenSource #AgenticAI #HighPerformance

40

1K

91

297

595K

0

10

2

1

2K

MinminSun2019 retweeted

Qwen

@Alibaba_Qwen

28 days ago

Fast, faster, Qwen. 🚀 Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners. Huge thanks to @lightseekorg, @NVIDIAAI, the Mooncake team, and @tri_dao for the pioneering FA4 optimization. Together, we are pushing the boundaries of open-source LLM inference. 🤝✨ Dive into the full @PyTorch blog post below! 👇 https://t.co/p04wookcZj #Qwen #Qwen3_5 #TokenSpeed #LLM #Inference #AI #PyTorch #OpenSource #AgenticAI #HighPerformance

40

1K

91

297

595K

Minmin Sun

@MinminSun2019

Last Seen Users on Sotwe

Trends for you

Most Popular Users