Small KVarN update ๐
Thanks to everyone who tested it and shared feedback! The vLLM PR is coming soon.
At the same KV-cache budget:
โข ~2ร faster decode than TurboQuant
โข Better accuracy
โข MLA, hybrid & speculative decoding
@sztlink@no_stp_on_snek Thanks, happy everything works. I am still fixing many things on the repo so this kind of posts help me a lot :) put a star if you want to support the project :) https://t.co/sBv78XjTBs
@jaga_prasanna@PavloMolchanov tbh, this is a trend I struggle to understand as well. In terms of accuracy, INT4 was very strong (especially compared to MXFP4). The main reason we're now dealing with all these new microscaling formats is that hardware is being built around them. Not a huge fan, personally