Arnab Sen Sharma @arnab_api - Twitter Profile

Pinned Tweet

7 months ago

How can a language model find the veggies in a menu? New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options. Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

arnab_api's tweet photo. How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵 https://t.co/GVfWTx4jnc

1

68

23

24

13K

arnab_api retweeted

Gabriel Franco @gvsfranco

6 days ago

🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇

gvsfranco's tweet photo. 🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University!

Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇 https://t.co/UrKC1lOMr8

2

119

30

43

22K

arnab_api retweeted

NDIF @ndif_team

12 days ago

Can you tell when an AI model is lying? Announcing Aletheia's Quest, an AI lie detection challenge running this summer, organized by @cadenza_labs and @ndif_team. Multiple model organisms to interrogate and probe, $50K prize pool, no local GPU required.

ndif_team's tweet photo. Can you tell when an AI model is lying?

Announcing Aletheia's Quest, an AI lie detection challenge running this summer, organized by @cadenza_labs and @ndif_team.

Multiple model organisms to interrogate and probe, $50K prize pool, no local GPU required. https://t.co/1wq0rahlFX

1

47

17

34

11K

Arnab Sen Sharma @arnab_api

about 2 months ago

Super excited to be attending @iclr_conf in Rio. Stop by our poster tomorrow morning (10:30am - 1:00pm) in Pavilion 4 (P4-#4001) to know about list-processing mechanisms in LMs. DMs are open. Please reach out if you want to meet up!

Arnab Sen Sharma @arnab_api

7 months ago

How can a language model find the veggies in a menu? New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options. Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

1

68

23

24

13K

0

14

3

1

989

Who to follow

Mor Geva

@megamor2

Assistant Professor at @TelAvivUni and Research Scientist at @Irregular; previously at @GoogleResearch, @GoogleDeepMind and @allen_ai

MTS @thinkymachines | Ph.D. @MITEECS

arnab_api retweeted

Eric Todd @ericwtodd

5 months ago

Can you solve this algebra puzzle? 🧩 cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️

ericwtodd's tweet photo. Can you solve this algebra puzzle? 🧩

cb=c, ac=b, ab=?

A small transformer can learn to solve problems like this!

And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️ https://t.co/4IRrEp1gDY

8

321

49

231

56K

arnab_api retweeted

Koyena Pal

@kpal_koyena

5 months ago

Can models understand each other's reasoning? 🤔 When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way? Our new preprint with @davidbau and @csinva explores CoT generalizability 🧵👇 (1/7)

kpal_koyena's tweet photo. Can models understand each other's reasoning? 🤔

When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?

Our new preprint with @davidbau and @csinva explores CoT generalizability 🧵👇

(1/7) https://t.co/rwB9BcOafB

7

207

24

142

25K

arnab_api retweeted

David Bau @davidbau

6 months ago

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: https://t.co/LSwBf9XQzE

davidbau's tweet photo. At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today.

Here is a blog post summarizing the talk:

https://t.co/LSwBf9XQzE https://t.co/Fmff42hcO0

22

557

100

377

109K

arnab_api retweeted

Chris Wendler @wendlerch

6 months ago

I am very excited to share that our paper, "One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models" will be presented at #NeurIPS2025! @ViaSurkov is presenting it at #MexIPS2025: 📍𝐈𝐟 𝐲𝐨𝐮 𝐚𝐫𝐞 𝐚𝐭𝐭𝐞𝐧𝐝𝐢𝐧𝐠 𝐍𝐞𝐮𝐫𝐈𝐏𝐒 𝐢𝐧 𝐌𝐞𝐱𝐢𝐜𝐨 𝐂𝐢𝐭𝐲, 𝐩𝐥𝐞𝐚𝐬𝐞 𝐬𝐭𝐨𝐩 𝐛𝐲! Date: Thursday, Dec 4, 2025 Time: 11:00 AM – 2:00 PM PST Location: Foyer (Mexico City Poster Session) Come visit @ViaSurkov it's his first conference and he will be happy to explain his amazing work. Sadly, #NeurIPS2025 does not allow for parallel presentation in San Diego. However, I am in San Diego and happy to meet up / chat. Please don't hesitate to reach out here or via [email protected]. Once again, a big shout out to our brilliant students Viacheslav Surkov and Antonio Mari who did phenomenal work here and pushed this work (that started as a class project more than a year ago) all the way to pass the high threshold of #NeurIPS2025. Also, I want to thank https://t.co/lXSt28RIh1 (@andyarditi and @ryan_kidd44 in particular) for helping us to finance Viacheslav Surkov's conference trip. Please find more information about our work below. We have so many amazing interactive materials (e.g., 3x huggingface demo spaces) for you to check out. Most of our implementations are open-sourced (RIEBench on FLUX, which we added to our appendix during the NeurIPS rebuttal is currently missing but we plan to add it ASAP). Me demoing the demo attached.

0

78

12

41

12K

arnab_api retweeted

Tamar Rott Shaham @TamarRottShaham

7 months ago

A key challenge for interpretability agents is knowing when they’ve understood enough to stop experimenting. Our @NeurIPSConf paper introduces a self-reflective agent that measures the reliability of its own explanations and stops once its understanding of models has converged.

TamarRottShaham's tweet photo. A key challenge for interpretability agents is knowing when they’ve understood enough to stop experimenting.
Our @NeurIPSConf paper introduces a self-reflective agent that measures the reliability of its own explanations and stops once its understanding of models has converged. https://t.co/hWIefkAVfc

2

53

29

10

9K

Arnab Sen Sharma @arnab_api

7 months ago

Thanks to my collaborators @giordanoprogers , @NatalieShapira, and @davidbau. Checkout our paper for more details: 📜 https://t.co/A7cEMQlK7O 💻 https://t.co/kiwYl9UOHv 🌐 https://t.co/70UsQLGyn9

0

11

0

7

833

Arnab Sen Sharma @arnab_api

7 months ago

How can a language model find the veggies in a menu? New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options. Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

1

68

23

24

13K

Arnab Sen Sharma @arnab_api

7 months ago

The fact that the neural mechanisms implemented in transformer architecture align with human-designed symbolic strategies suggests that certain computational patterns rise naturally from task demands rather than specific architectural constraints.

1

5

0

278

Arnab Sen Sharma

@arnab_api

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users