When I was consulting for @HBO Silicon Valley, zero-loss compression was the holy grail Richard Hendricks chases that perfect middle-out algo could shrink everything w/out breaking a single bit.
Google just did something even more practical for the AI era: TurboQuant compresses LLM key-value caches down to 3 bits per value using random orthogonal rotation + PolarQuant scalar quantization & optional 1-bit QJL residual correction.
=>> 6× memory reduction, up to 8× faster attention (on H100), & 0 degradation on LongBench, Needle-in-a-Haystack, and RULER for models like Gemma. No retraining, no calibration needed.
Fiction just got out-engineered by reality. 😅💚💚