Gergely Szilvasy

@algoriddle

Joined March 2014

48 Following

2 Followers

1 Posts

algoriddle retweeted

Manuel Faysse

@ManuelFaysse

23 days ago

🚨 Do LLMs need to store everything they read in memory? To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵

ManuelFaysse's tweet photo. 🚨 Do LLMs need to store everything they read in memory?
To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵 https://t.co/5UeHSpusGo

204

148

21K

Gergely Szilvasy

@algoriddle

Last Seen Users on Sotwe

Trends for you

Most Popular Users