Proud to core-build this with the MiMo team! Breaking 1,000 TPS on a 1T model with standard 8-GPU nodes is just the beginning of the Speed Scaling era. Technical deep dive coming on our channel! 🚀⚡️
🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀
We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME!
Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE.
Read the full technical deep dive:https://t.co/MX0kjHKdKi
Want to experience the future of real-time AI?
👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk
⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT)
💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now.
⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience.
🤝 Enterprise & Large-Scale Needs: [email protected]
The era of Speed Scaling has arrived. Read our full technical deep dive into the microsecond-scale execution reality & Co-design specs here: https://t.co/Q3t0oYw0gW
📦 GitHub: https://t.co/fq5PMPfAYG
How did we push a 1 Trillion parameter MoE model past the 1,000 TPS barrier on a standard 8-GPGPU node with @XiaomiMiMo? 🚀
It’s not just a faster kernel. It’s a total execution model revolution.
Key technical breakthroughs inside TileRT:
🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀
We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME!
Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE.
Read the full technical deep dive:https://t.co/MX0kjHKdKi
Want to experience the future of real-time AI?
👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk
⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT)
💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now.
⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience.
🤝 Enterprise & Large-Scale Needs: [email protected]
⚡️ Heterogeneous Workers & Warp Specialization: Breaking the serial pace to orchestrate specialized worker groups not just within a single SM, but scaling across the entire GPU execution domain.
@XiaomiMiMo Proud to core-build this with the MiMo team! Breaking 1,000 TPS on a 1T model with standard 8-GPU nodes is just the beginning of the Speed Scaling era. Technical deep dive coming on our channel! 🚀⚡️
@zRdianjiao@Zai_org Huge milestone. Grateful to @Zai_org for the partnership. Flagship quality at 400 tok/s is just the start of what we can do together. 🔥