Summarization (you can get all of them on the internet, not insider information):
- M3
- No more M2.x, e.g., no M2.8
- MoE, 1M context
- Larger than M2 (200~B model)
- MSA, aka MiniMax Sparse Attention architecture
- For MSA, 9.7x prefill speedup, 15.6x decoding speedup
- Will be released in a few days
- Will be open-sourced