TileRT @TileRT_AI - Twitter Profile

Pinned Tweet

about 8 hours ago

Proud to core-build this with the MiMo team! Breaking 1,000 TPS on a 1T model with standard 8-GPU nodes is just the beginning of the Speed Scaling era. Technical deep dive coming on our channel! 🚀⚡️

Xiaomi MiMo

@XiaomiMiMo

about 8 hours ago

🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive：https://t.co/MX0kjHKdKi Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: [email protected]

XiaomiMiMo's tweet photo. 🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀

We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME!

Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE.

Read the full technical deep dive：https://t.co/MX0kjHKdKi

Want to experience the future of real-time AI?
👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk
⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT)
💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now.
⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience.
🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com

88

1K

179

522

124K

4

18

5

8

3K

TileRT @TileRT_AI

about 7 hours ago

@TaXue2025 @XiaomiMiMo GLM高速版也是我们支持的，感谢关注♥️

0

25

TileRT @TileRT_AI

about 8 hours ago

The era of Speed Scaling has arrived. Read our full technical deep dive into the microsecond-scale execution reality & Co-design specs here: https://t.co/Q3t0oYw0gW 📦 GitHub: https://t.co/fq5PMPfAYG

0

6

4

1

603

TileRT @TileRT_AI

about 8 hours ago

How did we push a 1 Trillion parameter MoE model past the 1,000 TPS barrier on a standard 8-GPGPU node with @XiaomiMiMo? 🚀 It’s not just a faster kernel. It’s a total execution model revolution. Key technical breakthroughs inside TileRT:

Xiaomi MiMo

@XiaomiMiMo

about 8 hours ago

🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive：https://t.co/MX0kjHKdKi Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: [email protected]

88

1K

179

522

124K

5

27

2

14

3K

TileRT @TileRT_AI

about 8 hours ago

⚡️ System & Model Co-design: Deep technical synergy with the MiMo team on FP4/FP8 mixed quantization and production-grade DFlash.

0

3

0

482

TileRT @TileRT_AI

about 8 hours ago

⚡️ Heterogeneous Workers & Warp Specialization: Breaking the serial pace to orchestrate specialized worker groups not just within a single SM, but scaling across the entire GPU execution domain.

0

4

0

459

TileRT @TileRT_AI

about 8 hours ago

⚡️ Tile-grained Pipelining: Deeply overlapping memory movement, tensor computation, and communication at the physical tile level.

0

3

0

407

TileRT @TileRT_AI

about 8 hours ago

⚡️ Persistent Kernels: The entire compute pipeline runs continuously inside the GPU, enabling full-stack continuous prefetching and erasing operator boundaries.

0

4

0

1

302

TileRT @TileRT_AI

about 8 hours ago

@XiaomiMiMo Proud to core-build this with the MiMo team! Breaking 1,000 TPS on a 1T model with standard 8-GPU nodes is just the beginning of the Speed Scaling era. Technical deep dive coming on our channel! 🚀⚡️

0

44

0

3

4K

TileRT @TileRT_AI

15 days ago

感谢鸭哥的博客推荐，我们还会有新的惊喜，欢迎继续关注❤️

鸭哥

@grapeot

17 days ago

智谱 GLM-5.1 高速版 API 达到 400 tokens/s。这不是优化得更快，而是从执行模型层面重构了 GPU 推理。深度分析了 TileRT 的技术原理，以及推理速度为什么正在成为 AI API 的第二条竞争轴。 https://t.co/50nA3VfDu7

3

12

1

8

2K

1

2

0

342

TileRT @TileRT_AI

18 days ago

@zRdianjiao @Zai_org Huge milestone. Grateful to @Zai_org for the partnership. Flagship quality at 400 tok/s is just the start of what we can do together. 🔥

0

2

0

23