📣 Excited to announce our oral presentation at #ICLR!
LLMs capture rich semantic structure, as evidenced by their strong performance across a wide range of language and reasoning tasks.
But Sparse Autoencoders (SAEs), a popular interpretability tool, mostly learn local, noisy, token-level features when applied to LLMs (e.g., hundreds of features for the word “the”).
So why aren’t SAEs finding that rich semantic structure?
👉 Because they ignore the sequential nature of language.
We introduce Temporal SAEs to bridge this gap.
https://t.co/HLvuAV7Qek
🧵 [1/N]
How can we improve LLMs without any additional training? 🤔
The standard playbook is using Best-of-N: generate N responses ➡️ use a reward model to score them ➡️ pick the best 🏆
More responses = better results... right?
Well, not exactly. You might be reward hacking!
Instead, you should hedge! 🎯
Can we use coding-theory, heavy-tailed distributions, and optimal-transport to create 𝘇𝗲𝗿𝗼-𝗱𝗶𝘀𝘁𝗼𝗿𝘁𝗶𝗼𝗻, 𝗲𝗮𝘀𝘆 𝘁𝗼 𝘂𝘀𝗲, 𝘄𝗮𝘁𝗲𝗿𝗺𝗮𝗿𝗸𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀? We show they can — and the result is pretty exciting! 🎉 🧵 (1/n)
Happy to share we received best paper at NENLP workshop at Yale 🥳🥳!
tldr: Current alignment methods give excessive discretion to annotators in defining what good behavior means. This means we don't know what we are aligning to ‼️
We formalize discretion in alignment and propose mechanisms for data curators & model developers to monitor for it.
Paper link below ⬇️
[1/x] 🚀 We're excited to share our latest work on improving inference-time efficiency for LLMs through KV cache quantization---a key step toward making long-context reasoning more scalable and memory-efficient.
AI is built to “be helpful” or “avoid harm”, but which principles should it prioritize and when?
We call this alignment discretion. As Asimov's stories show: balancing principles for AI behavior is tricky.
In fact, we find that AI has its own set of priorities
(comic @xkcd)👇
The standard practice in differential privacy of targeting ε at small δ is extremely lossy for interpreting the level of privacy protection. In practice (e.g., for DP-SGD), we can do much better!
We show how in the #NeurIPS2024 paper:
https://t.co/LKeW48wMx1
Short summary👇
Imagine an all-powerful AI with any ideology you don't agree with! Super proud of this work, where we show that every LLM reflects a different ideological worldview, which should worry everyone.
Finally, I am pleased to announce
🪢Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)🪢
Joint work with Usha Bhalla, as well as @Suuraj, @FlavioCalmon, and @hima_lakkaraju, which was just accepted to NeurIPS 2024! Check out the paper here:
https://t.co/N1dmE1mkmA
Part 2 of my 2024 publication tweets! Please welcome Multi-group Proportional Representation, a novel metric for measuring representation in image generation and retrieval. This work was recently accepted at @NeurIPSConf 2024. (1/n)
First up, how do various aspects of trustworthy machine learning interact? Can we expect a production ML system to satisfy all regulatory requirements of fairness, privacy, and interpretability simultaneously when past research generally focuses on one component at a time? (1/n)
Mario was a friend, close collaborator, and the first post-doc I hired at Harvard. This is a devastating loss to our community. Please consider reading one of Mario's papers this week. You can also learn more about his research here: https://t.co/zn8JG1tONX
Mario Diaz Torres, a brilliant researcher and mathematician, passed away suddenly on August 31st. @MDMarioDiaz was a rising star in the LatAm math community and was doing exceptional work in information theory, differential privacy, and related areas. https://t.co/cTeHRS2WMB
Mario was incredibly passionate about math, information theory, and statistics. He was homeschooling his son so he could “teach him math in a principled and advanced manner.” Now his family really needs our support. Please consider donating here: https://t.co/YqX3lmEptH
This week, I spoke on the panel “AI, Rights, and Democracy” at the Brazilian Supreme Court. Thank you @STF_oficial for the invitation. It was an incredible experience! See my talk (in pt-br) here: https://t.co/JTz0z0YLyE
Back home from FAccT - I thankful for the work our community is doing & the values it stands for. Serving it has been a labor of love for me & I am beyond grateful to have done so this year along my truly wonderful program co-chairs & human beings @mikarv@RDBinns@FlavioCalmon