1/ 🌳 We are releasing Pando, a benchmark to climb for interpretability methods. We created 720+ model organisms with known ground-truth decision rules planted inside, then asked: do white-box tools actually beat just asking the model?
Excited to give a talk at @SimonsInstitute Trust in Decentralized Systems Workshop on Tuesday at 11am!
Title: "2026 Is the New 2016" — on federated memory, contextual privacy, and personalized agents.
The privacy conversation has moved way past training data memorization. With persistent memory, tool use, and agents acting on your behalf (#clawdbot👀), the real risk is what models do with the data you feed them at inference time. Context window is the new attack surface!
I'll talk about our new benchmark CIMemories, where we test whether models can actually make context-dependent decisions about what to share with whom from memory. Turns out they really can't, up to 69% violation rates, and it only gets worse the more you use them.
Link to slides🔻🔻🔻
[1/n] Open Problems in AI Child Safety
AI is misused to generate CSAM at alarming scale. 400% increase in AI-generated CSAM since 2024 (IWF). 1 in 17 teens are victimized by deepfake nudes.
We outline 15 open problems where AI safety research can help.
🔗https://t.co/2BGTSsPj53
We asked LLMs: Is Santa real? 🎅
GPT-4o says Yes at any age. Claude tells 5-year-olds the truth.
What does this reveal about invisible assumptions in AI? Do LLMs believe in the tooth fairy or the Illuminati?
New holiday post from @mlcmublog 🔗⬇️
Ending the year on a high note, with 2 papers accepted at @satml_conf 🎉 check them out!
📄 One is an interview study on cross-silo Federated Learning, highlighting a misalignment between real-world challenges & current research focus. (https://t.co/5osTNA6FHL)
📄 The other evaluates machine unlearning methods on a harder, but realistic setting involving multi-hop knowledge.
(https://t.co/SK8qibGKvf)
🔥 New blog on scaling RL📈 by unlocking a *new* way to explore on hard problems.
- ☹️Currently scaling compute on hard problems is futile.
- ☠️Turns out exploration during RL is the main bottleneck!
- 👎Classical exploration methods like token entropy fail.
- 💡We need a new paradigm of exploration that goes beyond simply sharpening or chaining (vg-gap) capabilities in the base model.
- 🗝️The key lies in the instruction-following abilities of base models to unlock a new regime of guided exploration.
More here 👇:
🚀 Federated Learning (FL) promises collaboration without data sharing. While Cross-Device FL is a success and deployed widely in industry, we don’t see Cross-Silo FL (collaboration between organizations) taking off despite huge demand and interest.
Why could this be the case? 🤔
We conduct an interview study to dig deeper into the real-world barriers to cross-silo FL adoption! 👇 (1/3)
As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up!
Only one paper to review at a time and <= 6 per year, reviewers report greater satisfaction than reviewing for conferences!
1/
Can model agreement replace LLM cascades routers for efficient inference?
Formalizing this, we found that under broad conditions, ensemble provides provably safe, training-free routing rules.
We call the method *Agreement-Based Cascading (ABC)*, and it's live now at TMLR!
🧵
I'm very excited to share some new work https://t.co/0rADO9DMCM. This work started out in conversations with @thorn where we realized that shadow model MIAs couldn't be used to audit models for harmful content of children. See 🧵 for why, and our progress on solving this...
ICML offers an optional poster printing service https://t.co/Y2w0rw1KDn
Orders can be picked up the day at the Vancouver Convention Centre in West MR 104 during the following hours:
Monday - Friday: 7:30 am - 5:00 pm
Saturday: 8:00 am - 1:00 pm
🚨 NEW PAPER: What if LLMs could tackle harder problems - not by explicitly training on longer traces, but by learning how to think longer?
Our recipe e3 teaches models to explore in-context, enabling LLMs to unlock longer reasoning chains without ever seeing them in training. 🤯
Website: https://t.co/si6sTgpbtZ
Paper: https://t.co/NmzUSH2P4y
🧵[1/8]
https://t.co/Jsl6oztcSF
Are your LLMs truly forgetting unwanted data?
In this new blog post authored by @shengyuan_26734, Yiwei Fu, @zstevenwu, and @gingsmith, we discuss how benign relearning can jog unlearned LLM's memory to recover knowledge that is supposed to be forgotten.
I’m excited to share new work from Datadog AI Research! We just released Toto, a new SOTA (by a wide margin!) time series foundation model, and BOOM, the largest benchmark of observability metrics. Both are available under the Apache 2.0 license. 🧵
Thrilled to share our new work on improving LLM unlearning! 🚀
Gradient-based unlearning struggle with high cost, instability & lack of precision.
We introduce Dynamic SAE Guardrails (DSG): an activation-based approach using SAEs for targeted, efficient knowledge removal.
https://t.co/0t37BZIOlf
📈⚠️ Is your LLM unlearning benchmark measuring what you think it is?
In a new blog post authored by @prthaker_, @shengyuan_26734, @neilkale, @yash_maurya01, @zstevenwu, and @gingsmith, we discuss why empirical benchmarks are necessary but not sufficient measures of success (SaTML 2025).
The ICML 2025 workshops list is online! https://t.co/jNYm5qEZSm. Many exciting topics, spanning multi-agent systems, world models, test-time adaptation, actionable interpretability, and much more.
How to effectively unlearn finetuning data?
❌ Approx. methods leak sensitive data
✅ Exact unlearning (eg. retraining) is secure 🔒 but inefficient
🚨 New paper: *efficient* & *exact* unlearning (led by Kevin)
🗝️ Idea: model merging at scale
https://t.co/Skks6H8Mcm
🧵⤵️