PhD student @TechnionLive
Trying to better understand LLMs
Ex Research Intern @Verily | Ex Research Intern @Meta
Working with @HaggaiMaron and @el_yaniv
1/
How much can you compress an LLM’s KV cache?
tl;dr it depends on how you train your model.
Many strong context compaction methods, such as Cartridges and attention matching, operate post-hoc: given a fixed model and a context, they try to compress the resulting KV cache.
@yoav_gelberg and I ask the complementary question:
can we train the model to produce KV representations that are easier to compress?
In other words: keep the compression method fixed, and change the representations it sees.
- "Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions", AAAI 2026 (https://t.co/ixdfiyqNqO) [8/8]
🧵"Neural Message Passing on Attention Graphs for Hallucination Detection" at #ICLR2026 !
🕸️We apply GNNs on the structured data LLMs produce as they generate text (e.g. attentions) to predict their errors.
📄 https://t.co/IQEyA7zaht
🤝 @GuyBarSh (co-1st) @YftahZ@HaggaiMaron
📌 [4/4] On the Expressive Power of GNN Derivatives
We study how using gradients of GNNs can increase their expressive power, providing a principled way to go beyond standard message passing.
https://t.co/XwyjvdhVNw
🤔 Can discrete diffusion models actually outperform standard classifiers?
We show that it can!
📄 https://t.co/TwQx7iP17o
💻 https://t.co/TjPeGWcIED
🌐 https://t.co/ga3YOazPog
[6/7] Results (over 15 LLM/dataset combinations):
• Consistently outperforms classic probes
• Zero-shot generalization to new datasets
• Fast adaptation to unseen LLMs by tuning only their new corresponding adapter