Angelos Poulis

@angelosps

CS PhD student @BUCompSci

Boston, MA

Joined September 2019

48 Following

8 Followers

8 Posts

angelosps retweeted

Gabriel Franco @gvsfranco

19 days ago

🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇

gvsfranco's tweet photo. 🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University!

Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇 https://t.co/UrKC1lOMr8

120

23K

Angelos Poulis @angelosps

2 months ago

Takeaway: truth directions in LLMs seem robust mostly in a limited range of pure-factual tasks for specific prompt formats, but break down when truth assessment requires tracking intermediate results. 📄Testing the Limits of Truth Directions in LLMs: https://t.co/YnMmE7zlgE

Angelos Poulis @angelosps

2 months ago

Does an LLM have an internal representation of truth? Yes... but it is more limited than previously assumed. E.g., counting how many (out of 3) cities are in the same country can significantly degrade truth representations. New preprint with @mcrovella and Evimaria Terzi🧵

151

Angelos Poulis @angelosps

2 months ago

Geometrically, we observe that as task difficulty increases, activations of true and false statements become indistinguishable.

Angelos Poulis

@angelosps

Last Seen Users on Sotwe

Trends for you

Most Popular Users