We are excite to announce FlashInfer v0.2!
Core contributions of this release include:
- Block/Vector Sparse (Paged) Attention on FlashAttention-3
- JIT compilation for customized attention variants
- Fused Multi-head Latent Attention (MLA) decoding kernel
- Lots of bugfix and improvements involving CUDAGraph compatibility, RMSNorm/RoPE numerical issue, etc.
blog post: https://t.co/tMBFmCfAc0