@bcherny@tomhacks The more you answer these guys the more they are going to keep coming up with new ways to attack your efforts. Their objective is not to provide feedback but the farm engagement. And your replies are bringing them into people’s timelines.
Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc