📍 Now @anna_hedstroem is a Postdoctoral Fellow at the @ETH_AI_Center, working with the @ivia_lab and Learning & Adaptive Systems (LAS) group.
Anna's focus ahead: evaluation-centric interpretability, LLM steering, and AI safety. ✨🧠💻☕️
More info: https://t.co/dNRvRb2ToK.
🔊 Not to miss …. last month @anna_hedstroem defended her PhD “Evaluation-centric advances in neural model interpretability” at TU Berlin — with distinction! ✨🧠💻☕️
Here’s a thread of a selection of Anna’s evaluation-centric interpretability work + what comes next. 🧵
Happy to share that our PRISM paper has been accepted at #NeurIPS2025 🎉
In this work, we introduce a multi-concept feature description framework that can identify and score polysemantic features.
📄 Paper: https://t.co/7HE1JGhnvD
#NeurIPS#MechInterp#XAI
📚 During his PhD, Kirill co-authored 11 papers spanning interpretability, neuron analysis & robust explanations. You can find all of them on his Google Scholar:
👉 https://t.co/IvFVirk3QE
Once again, congrats @kirill_bykov on an outstanding PhD journey! 🎓✨
🎉 Huge congratulations to @kirill_bykov, the very first PhD student of our lab, who successfully defended his thesis “Explaining Representations in Deep Neural Networks” this Monday with summa cum laude! 👏
🧵 In the next tweets, we’ll highlight some of his key works:
🧐 DORA: Exploring Outlier Representations in Deep Neural Networks (TMLR 2023) A framework for analyzing & detecting learned representations in neural networks.
👉 https://t.co/Mza0yhoFKz
🔍 When do neurons encode multiple concepts?
We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.
📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
https://t.co/7HE1JGhnvD
🧵
If you're at #AAAI2025 don't miss our poster today (alignment track)!
Paper 📘: https://t.co/1kDjrX3OaM
Code 👩💻: https://t.co/SiVfRWRhx0
Team work with @eirasf and @Marina_MCV