Actionable Interpretability Workshop ICML2025 @actinterp - Twitter Profile

13 days ago

Submit your work! The 2nd Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at COLM 2026 in San Francisco! Submission Deadline: June 21, 2026 @ActInterp

OrgadHadas's tweet photo. Submit your work! The 2nd Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at COLM 2026 in San Francisco!

Submission Deadline: June 21, 2026

@ActInterp https://t.co/HF2lJczx8T

2

130

18

77

13K

Actionable Interpretability Workshop ICML2025 @ActInterp

4 months ago

A very exciting outcome of the workshop!

Hadas Orgad @OrgadHadas

4 months ago

Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How? We're ready to answer. 🧵

OrgadHadas's tweet photo. Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How?
We're ready to answer.
🧵 https://t.co/Q61MLb9kO8

2

250

40

195

35K

0

5

0

2

408

ActInterp retweeted

Adi Simhi @AdiSimhi

8 months ago

🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵

AdiSimhi's tweet photo. 🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?
🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵 https://t.co/GAi7wWBDaC

1

36

17

5

4K

ActInterp retweeted

Yonatan Belinkov @boknilev

8 months ago

Opportunities to join my group in fall 2026: * PhD applications direct or via @ELLISforEurope (https://t.co/NdG57c3doS) * Post-doc applications direct or via Azrieli @azrielifdn (https://t.co/gzyYfN0z34) or Zuckerman @stem_program (https://t.co/ZqCEbb9o4C)

7

329

49

240

42K

ActInterp retweeted

Ivan Titov @iatitov

11 months ago

Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardees! Sad not to have been there in person, it looked like a fantastic workshop. @AmsterdamNLP @EdinburghNLP

0

28

3

3K

Actionable Interpretability Workshop ICML2025 @ActInterp

11 months ago

1⃣Detecting High-Stakes Interactions with Activation Probes - https://t.co/oN0n7XTdke 2⃣ Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations - https://t.co/YMKuvBcD8z

1

0

1

396

Actionable Interpretability Workshop ICML2025 @ActInterp

11 months ago

Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇

ActInterp's tweet photo. Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏
and thanks for the fantastic oral presentations!

Check out the papers here 👇 https://t.co/C64Zk7lxsW

1

16

3

2

6K

ActInterp retweeted

NDIF @ndif_team

11 months ago

Great to present what’s coming next for NDIF at the @actinterp workshop at #ICML2025! If you missed us, let’s chat after the conference. Reach out here: https://t.co/NCIYb0pq5E