Anna Hedström

Verified account

@anna_hedstroem

AI Fellow @eth_ai_center | PhD ML @TUBerlin | evaluation-centric interpretability and AI alignment

🇨🇭

Joined November 2020

351 Following

346 Followers

136 Posts

@anna_hedstroem

9 months ago

Almost forgot to share — last month, I defended my thesis, with distinction! Feeling deeply grateful for the learnings, collaborations and friendships along the way. New chapter at @ETH_AI_Center 🚀

Understandable Machine Intelligence Lab @UMI_Lab_AI

9 months ago

🔊 Not to miss …. last month @anna_hedstroem defended her PhD “Evaluation-centric advances in neural model interpretability” at TU Berlin — with distinction! ✨🧠💻☕️ Here’s a thread of a selection of Anna’s evaluation-centric interpretability work + what comes next. 🧵

UMI_Lab_AI's tweet photo. 🔊 Not to miss …. last month @anna_hedstroem defended her PhD “Evaluation-centric advances in neural model interpretability” at TU Berlin — with distinction! ✨🧠💻☕️

Here’s a thread of a selection of Anna’s evaluation-centric interpretability work + what comes next. 🧵 https://t.co/y8ICPOV0jX

1

3

0

0

963

0

9

0

1

310

@anna_hedstroem

9 months ago

PRISM got accepted at @NeurIPSConf 2025! Congrats to the team ✨ @lkopf_ml @nfelnlp @kirill_bykov @BommerPhiline @Marina_MCV @EberleOliver

Laura Kopf @lkopf_ml

9 months ago

Happy to share that our PRISM paper has been accepted at #NeurIPS2025 🎉 In this work, we introduce a multi-concept feature description framework that can identify and score polysemantic features. 📄 Paper: https://t.co/7HE1JGhnvD #NeurIPS #MechInterp #XAI

1

9

5

0

1K

0

11

1

2

406

@anna_hedstroem

10 months ago

@MatthewKowal9 Amazing work!

0

1

0

0

37

@anna_hedstroem

11 months ago

Couldn’t be more proud and happy for my labmate @kirill_bykov who made it to the other side! Congrats again to the fantastic body of work produced!

Understandable Machine Intelligence Lab @UMI_Lab_AI

11 months ago

🎉 Huge congratulations to @kirill_bykov, the very first PhD student of our lab, who successfully defended his thesis “Explaining Representations in Deep Neural Networks” this Monday with summa cum laude! 👏 🧵 In the next tweets, we’ll highlight some of his key works:

UMI_Lab_AI's tweet photo. 🎉 Huge congratulations to @kirill_bykov, the very first PhD student of our lab, who successfully defended his thesis “Explaining Representations in Deep Neural Networks” this Monday with summa cum laude! 👏

🧵 In the next tweets, we’ll highlight some of his key works: https://t.co/5Nj2VJ2hL5

1

12

1

0

711

1

8

0

0

388

Who to follow

Kim Andrea Nicoli

Team Lead @oldendorff1921 🚢 | prev: postdoc @UniBonn @bifoldberlin @ml_tuberlin | PhD in ML ↔️ theoretical physicist

Explainable Machine Learning

Institute for Explainable Machine Learning @HelmholtzMunich and Interpretable and Reliable Machine Learning group @TU_Muenchen

Tanmoy Mukherjee

Senior researcher.Avid reader. Interested in ML/Explainability/Interpretability/CV

@anna_hedstroem

11 months ago

My brilliant co-author @salim_amk0 is presenting our work on Mechanistic Error Reduction with Abstention (MERA) now at ICML in Vancouver! 🚀 If you’re at ICML, come by East Exhibition Hall A-B, E-2605 at 4:30 pm (Vancouver, BC). We’d love to hear what you think!

anna_hedstroem's tweet photo. My brilliant co-author @salim_amk0 is presenting our work on Mechanistic Error Reduction with Abstention (MERA) now at ICML in Vancouver! 🚀

If you’re at ICML, come by East Exhibition Hall A-B, E-2605 at 4:30 pm (Vancouver, BC).

We’d love to hear what you think! https://t.co/lI69baBGYF

0

4

0

0

279

anna_hedstroem retweeted

Salim Amoukou @salim_amk0

11 months ago

🚀 I'll be presenting our #ICML paper this afternoon! You’ve probably heard of Mechanistic Steering, the idea of modifying internal activations of a language model at inference-time (e.g., adding a vector) to influence its behaviour, often for alignment. But we take a different angle: 👉 We use it for error reduction. If you've explored this space, you know it’s full of heuristics: Which vector to use? How long should it be? When to steer at all? 🎯 In our work, we bring principled answers to these questions, with provable guarantees. We introduce MERA (Mechanistic Error Reduction with Abstention for Language Models), a method for reducing errors in LLMs at inference-time by: ✅ Steering only when necessary ✅ Adapting how much to steer ✅ Abstaining unless confident improvement And the best part? MERA is modular. You can plug it into any existing steering method to make it more effective and safer. 📍Catch me at @icmlconf 📌 Poster Location: East Exhibition Hall A-B, E-2605 at 4:30 pm. 🧠 Paper: https://t.co/cRWLsqqXp3 Big thanks to my amazing co-authors: @anna_hedstroem, @tom_bewley, Saumitra Mishra, and Manuela Veloso. #ICML2025 #LLMs #MechanisticSteering #InferenceTime #LLMSafety #ResponsibleAI #TrustworthyAI #AIResearch

0

2

1

0

312

@anna_hedstroem

12 months ago

Endless gratitude to brilliant @salim_amk0, @tom_bewley, and our fantastic collaborators at JP Morgan #ICML25

1

1

0

0

190

@anna_hedstroem

12 months ago

Couldn’t be more excited to share our latest paper — accepted to ICML 2025 @icmlconf — with JP Morgan AI Research. It explores a simple question: To safely and effectively mitigate errors post-training, when (and how much) should we steer large language models? 🧵

anna_hedstroem's tweet photo. Couldn’t be more excited to share our latest paper — accepted to ICML 2025 @icmlconf — with JP Morgan AI Research.

It explores a simple question:

To safely and effectively mitigate errors post-training, when (and how much) should we steer large language models?

🧵 https://t.co/hTFvc0Thbw

1

12

4

2

824

@anna_hedstroem

12 months ago

4/ What’s fascinating is not just the outcome but how concepts like "error" show up inside LLMs. This opens the door to more general forms of lightweight, post-training control — we're curious where else MERA may help. Paper https://t.co/KRzyzGWelP https://t.co/G64xYT3fp6

1

1

0

0

201

@anna_hedstroem

12 months ago

@lkopf_ml @kirill_bykov @nfelnlp @BommerPhiline @Marina_MCV @EberleOliver

0

3

0

0

127

@anna_hedstroem

12 months ago

Very excited to share this preprint on labelling polysemantic neurons! Have a read https://t.co/zrjqf5o8b0 And happy midsummer!

Laura Kopf @lkopf_ml

12 months ago

🔍 When do neurons encode multiple concepts? We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity. 📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework https://t.co/7HE1JGhnvD 🧵

lkopf_ml's tweet photo. 🔍 When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
https://t.co/7HE1JGhnvD
🧵 https://t.co/k8OR19MsbQ

1

14

4

6

2K

1

4

0

0

289

@anna_hedstroem

12 months ago

@spacecadet_kels You’re amazing!! Congrats

1

1

0

0

78

@anna_hedstroem

about 1 year ago

@saprmarks Agreed!

0

0

0

0

53

@anna_hedstroem

over 1 year ago

If you're at #AAAI2025 don't miss our poster today (alignment track)! Paper 📘: https://t.co/1kDjrX3OaM Code 👩‍💻: https://t.co/SiVfRWRhx0 Team work with @eirasf and @Marina_MCV

Carlos Eiras @eirasf

over 1 year ago

At 12:30 I'll be happy to take questions about our poster presentation at #AAAI2025. Is your explanation for a model's prediction better than the alternatives? "Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution" introduces QGE... 1/4

1

3

1

0

759

0

2

2

0

512

@anna_hedstroem

over 1 year ago

I couldn’t be more proud and happy to share that our paper also got awarded survey certification for "exceptionally thorough/ insightful survey” of interpretability evaluation Grateful to my brilliant co-authors @BommerPhiline @tfburns @SLapuschkin @WojciechSamek @Marina_MCV

Understandable Machine Intelligence Lab @UMI_Lab_AI

over 1 year ago

Our recently accepted TMLR paper has been awarded: 🔥 Survey certification 🔥 "For an exceptionally thorough or insightful survey of interpretability evaluation." 📖 Read: https://t.co/o2BYsQ0V15 💻 Code: https://t.co/MDUzlyWNi5

UMI_Lab_AI's tweet photo. Our recently accepted TMLR paper has been awarded:

🔥 Survey certification 🔥

"For an exceptionally thorough or insightful survey of interpretability evaluation."

📖 Read: https://t.co/o2BYsQ0V15
💻 Code: https://t.co/MDUzlyWNi5 https://t.co/MzpEEgfq1K

1

4

0

1

541

0

6

0

1

327

@anna_hedstroem

over 1 year ago

Our new paper is out! "Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions" 📖 Read: https://t.co/kOO1MNG7MF 💻 Code: https://t.co/yqo7k0IBld Thanks to my best collaborators @BommerPhiline @tfburns @SLapuschkin @WojciechSamek @Marina_MCV

Understandable Machine Intelligence Lab @UMI_Lab_AI

over 1 year ago

🚨 New paper alert! 🚨 We’re excited to share our latest work on interpretability evaluation: "Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions" 📜 Accepted at TMLR 🎉 🔥 Survey certification 🔥 📖 Read: https://t.co/o2BYsQ0V15

UMI_Lab_AI's tweet photo. 🚨 New paper alert! 🚨

We’re excited to share our latest work on interpretability evaluation:

"Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions"

📜 Accepted at TMLR 🎉
🔥 Survey certification 🔥
📖 Read: https://t.co/o2BYsQ0V15 https://t.co/9xOc1KQsVB

1

6

5

0

2K

0

8

0

1

307

@anna_hedstroem

over 1 year ago

@SatyaScribbles so well deserved!!!!

1

1

0

0

158

Last Seen Users on Sotwe

Trends for you

Most Popular Users