@lmsysorg@deepseek_ai Is there a place where I can find the deployment details, such as P/D sglang start arguments, KV transfer arguments, and configuration settings?
🚀 Day 1 of #OpenSourceWeek: FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
✅ BF16 support
✅ Paged KV cache (block size 64)
⚡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
🔗 Explore on GitHub: https://t.co/4JvJTn5HX2