Running DeepSeek V4 from @deepseek_ai on @vllm_project? Upgrade to v0.20.1 — 10+ bug fixes and optimizations, fully tested and verified by the open source community!
A huge thank you to @FireworksAI_HQ, @baseten, @novita, @lightseekorg, @daocloud, @nvidia, @redhatai and more for helping report, fix, and verify the stability and speed of vLLM. 🙏
🔧 DeepSeek V4 Productionization Reliability:
• Persistent topk cooperative deadlock at TopK=1024
• AOT compile cache import error
• Repeated RoPE cache initialization
• Non-streaming tool-call type conversion (DSV3.2/V4)
• torch inductor error on V4
⚡ Optimizations:
• Multi-stream pre-attention GEMM + configurable knob
• BF16 / MXFP8 all-to-all on FlashInfer one-sided comm
• PTX `cvt` for faster FP32 → FP4 conversion
• Integrated `head_compute_mix_kernel` for head computation
📖 Full notes → https://t.co/BXv7pl7z4y