🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University!
Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇
How do language models track entities across state changes? When tracking objects in different boxes, do they cumulatively build up a global state of what’s in every box? How do they add objects or remove objects (i.e. Entity Unbinding)? Find out in our ICML paper! 🧵
Check out our new preprint reflecting on what "circuits" actually explain: https://t.co/COaNI1hyc1 (w/ @DakingRai and @megamor2)
While the current practice follows "hypothesis-driven circuit discovery", we found that circuits discovered using existing approaches do not describe the general task but rather are dataset-specific. When the dataset includes multiple distinct mechanisms, the current approaches cannot distinguish them.
We propose Data-driven Circuit Discovery (DCD) and advocate that the principle of letting the data pattern reveal what mechanisms are there (as opposed to humans hypothesizing their existence or how they exist). Details in the thread🧵
Concurrent to our work, we are seeing more similar concerns raised by the community, e.g.,
- "Finding Interpretable Prompt-Specific Circuits in Language Models" by @gvsfranco https://t.co/9Y6WNjZ3kF
- "All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs" by @fnruji316625 https://t.co/SOOdZHFig6
More reflections and new methodologies are still needed in this space.
#MechanisticInterpretability #LLM
Singular vectors of the attention QK matrix align with features!
This has been found empirically in other works, like Talking Heads from @jack_merullo_. We show theoretically and empirically how and why.
New @icmlconf 2026 paper with Carson Loughridge and @mcrovella.
@brightclayapp@henrythe9ths Seems like it cannot directly access your returns and forms, but rather things like a tax estimate, filing checklist and "connect with an expert"
Does an LLM have an internal representation of truth? Yes... but it is more limited than previously assumed.
E.g., counting how many (out of 3) cities are in the same country can significantly degrade truth representations.
New preprint with @mcrovella and Evimaria Terzi🧵
Why do attention heads attend where they do?
We can now pinpoint the EXACT features causing attention—without counterfactuals, patching, or SAEs.
New @NeurIPSConf 2025 paper with @mcrovella: "Pinpointing Attention-Causal Communication in Language Models"
Can we identify the key signals moving between attention heads when a language model performs a task? Our paper (https://t.co/8reygOpZk9) offers new tools for this question. A key point of leverage is a new phenomenon we expose: sparse attention decomposition. Exploiting this effect we can isolate low-dimensional signals implementing communication between attention heads. We show that extracted signals have a predictable causal impact on model performance. [1/N]
"Graduates, as you step into the world, be a force for all that is good and carry our CDS values with you. Go! Make us proud," CDS Associate Provost @Bestavros. Congratulations CDS Class of 2024! 🎓🎓
"The secret of happiness is: Find something more important than you are and dedicate your life to it."
Daniel Dennett (March 28, 1942 - April 19, 2024)
Take a look at our new Giotto Suite spatial analysis platform and preprint! So grateful to work with such an amazing team. We started working on this massive project when I started my lab at @BUMedicine and @The_BMC in August 2020. Amazing collaboration with @gc_yuan lab.
Welcome to CDS @trgardos! Thomas Gardos joined @BU_Tweets Faculty of Computing & Data Sciences as an assoc. prof. of the practice & the inaugural dir. of the MS in #DataScience Program in October. Learn about his 30-year career & his vision for the MSDS. https://t.co/Qobea7jVZH
This semester, CDS proudly welcomed its 3rd cohort of PhD students & its 1st class of MS in Data Science students. "Their varied experiences will be a great asset to the interdisciplinary nature of #DataScience & CDS," Micah Sieber, Dir. Academic Programs. https://t.co/XCWBnCruww
We celebrated the 40th birthday of our department today @BU_Tweets @BUCompSci Thanks to all attendees: our alumni, friends, faculty and students 🎂🎉
Congrats and well done! 25 years ago the entire field of Internet Measurement (what we at that time called "Internet Measurement, Instrumentation, & Characterization") was born @BU_Tweets! Great to see the legacy of team @mcrovella continue, now extending to the use of ML & AI.