Lots of people asked how we track all these compression projects. We use attentionvc to monitor trending repos by category, super useful for staying on top of the space: https://t.co/e4AI8dj9XL
Thanks for the mention!
Note - in #headroom, we tried the same techniques - like dictionaries etc - BUT it destroys prefix caching.
So - folks should explore these techniques thinking not just about token compression BUT the impact to prefix caching.
Caching aware compression is key :)
@mwixamwixa2 and the agent re-reads the same file 3 times and gets the same error in 4 messages lol. paying more tokens for worse output is the real problem
@Ramdevgujj38411 that's the key question. our take is providers won't prioritize it, charging per token means compression is against their business model. same reason AWS didn't build Cloudflare
@Shubham75450791 yeah early LLMLingua on code was rough. the key difference is content-aware stages, you can't just drop tokens by perplexity when it's code. AST-aware compression that never touches identifiers is a completely different game