@thsottiaux Windows app still feels like it needs some love. The core functionality is there, but there are enough rough edges and UI quirks that it doesn't feel as polished as the web experience yet.
Why KV cache is one of the main reasons LLMs are fast?
KV cache is what connects attention mechanism with generation stage of autoregressive models.
These models generate text token by token, but each new token still attends to all previous ones.
→ To optimize decode phase, models store previously computed key and value vectors in a KV cache.
→ During generation, they only compute new Q/K/V states for the latest token and attend over cached past representations.
Without KV cache, the model would recompute keys and values for the entire sequence at every step (like token 501 recomputes tokens 1–500), that's very slow.
▪️ But the tradeoff of KV cache is memory, because it grows with sequence length, batch size, layers, and attention heads.
That’s why so much research today targets KV efficiency and memory optimization. For example:
- Upgrading attention mechanism, since it influences how KV cache is formed. Use more advanced attention like CompactAttention, MHA, MLA, etc. based on your needs.
- Improve memory management. System needs to identify what to store long-term or keep local, when to summarize, and when to trim.
You can learn more about KV cache + attention here: https://t.co/YlRyxCM9Tj
And how they fit into the full LLM inference pipeline here: https://t.co/tKjX8Wvdkp
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Creator of C++, Bjarne Stroustrup:
AI-generated code isn't ready — it generates more bugs, more bloat, more security holes, and is nearly impossible to validate
"senior developers are already retiring rather than deal with it"
The problem is that even a small prompt change can shift the entire codebase in unpredictable ways
New releases from Microsoft Research, live in 1 hour.
Join for ai that runs your repo + verification-first research + more.
👉 https://t.co/OVxLvELUxr
⏰ 9 AM PT/12 PM ET
💬 Join live + ask questions in chat
What physics gets wrong about the idea of “fundamental”
"Fundamental" in physics usually just means the smallest indivisible quanta plus the laws that govern them.
But we wouldn't get far without something more: boundary/initial conditions.
https://t.co/Bir0Z1UGuB
🇪🇺🤝🇯🇵Accelerating regulatory, research & industry cooperation.
The EU & Japan held their fourth meeting of the Digital Partnership Council & discussed:
🔹data governance
🔹digital identity
🔹AI
🔹quantum
🔹platforms
& more.
Learn more: https://t.co/QficZqPpsG
Stronger together in uncertain times 🇪🇺🇨🇦
Canada is a strategic partner to the EU and at a time of growing global instability, this bond matters more than ever.
Parliament wants the EU to take its cooperation with Canada to the next level.
Read more: https://t.co/IVsuu3Tr1o
The EU and Iceland are close friends.
Our security is shared, and so are the challenges we face.
Today’s signature of an EU-Iceland Security and Defence Partnership takes our relationship to the next level.
This will deepen our cooperation in areas that matter for the safety of our citizens, from maritime security to the protection of critical infrastructure.
This is a win for the EU and for Iceland.