Martin Tutek @mtutek - Twitter Profile

Pinned Tweet

about 1 year ago

🚨🚨 New preprint 🚨🚨 Ever wonder whether CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from parameters to assess CoT faithfulness. https://t.co/WZDUZbJbxC

2

58

11

34

10K

Martin Tutek

@mtutek

6 days ago

With the large influx of submissions and a faster pace of research, reproducibility is more important than ever. With this reproducibility challenge, we want to put the focus on best practices wrt. baselines🧱, ablations🌈, eval🔎 and generalizability🗺️ of interpretability!

BlackboxNLP @BlackboxNLP

6 days ago

📣 Announcing the BlackboxNLP 2026 Reproducibility Challenge! A new track dedicated to rigorous robustness checks of NLP interpretability work - stress-testing baselines, ablations, generalizability, and evaluation.

BlackboxNLP's tweet photo. 📣 Announcing the BlackboxNLP 2026 Reproducibility Challenge!

A new track dedicated to rigorous robustness checks of NLP interpretability work - stress-testing baselines, ablations, generalizability, and evaluation. https://t.co/40P9Zr40Lu

2

25

7

6

4K

0

11

4

0

558

mtutek retweeted

BlackboxNLP @BlackboxNLP

6 days ago

📣 Announcing the BlackboxNLP 2026 Reproducibility Challenge! A new track dedicated to rigorous robustness checks of NLP interpretability work - stress-testing baselines, ablations, generalizability, and evaluation.

2

25

7

6

4K

mtutek retweeted

Ana Marasović @anmarasovic

9 days ago

We evaluated CoT faithfulness evaluations & released 𝐁𝐨𝐧𝐚𝐅𝐢𝐝𝐞 so you can test yours too!!

0

16

5

9

2K

Who to follow

Sebastian Gehrmann

@sebgehr

Making AI trustworthy as Head of Responsible AI in the CTOs office @Bloomberg. Formerly LLMs @ Google Brain / PhD @ Harvard. views my own

Hila Gonen

@hila_gonen

Assistant Professor at @UBC_CS https://t.co/2cDfMi1bDR

Gözde Gül Sahin

@gozde_gul_sahin

Natural Language Processing & Machine Learning

mtutek retweeted

Yoav Gur Arieh @GurYoav

9 days ago

Can we tell when LLMs are being unfaithful in their chains of thought? We evaluated 8 methods claiming to do this, and found that most perform near chance! But evaluating this requires us to have ground-truth labels for CoT faithfulness. How can we obtain these?

4

144

25

92

14K

Martin Tutek

@mtutek

11 days ago

@sahoo2vec Agreed, I was extremely (pleasantly) surprised by the quality of pretty detailed reviews. Good job @pluralistic_ai

0

1

0

40

mtutek retweeted

Sahar Abdelnabi 🕊

@sahar_abdelnabi

12 days ago

📢📢 I'm expanding my group at the ELLIS Institute Tübingen and the Max Planck Institute for Intelligent Systems and hiring 3-4 PhD students, and also visiting PhD students for funded 3-6 months stays ✨💥 Research areas broadly span the safety, security, and trustworthiness of AI agents including: • Sycophancy, decision-making, and societal implications of AI • Autonomous agents and memory • Multi-agent safety, security, and privacy • Training dynamics of LLMs • AI control and oversight • AI for science I'm looking for: - Visiting PhD students with a strong computer science background and research experience. - Students with a Master's degree and demonstrated research experience. What we offer in Tübingen: amazing compute, collaborators, environment, and a lovely town! If you're interested, please fill out the Google form on my website: https://t.co/oCryUUIIUR @ELLISInst_Tue @MPI_IS #phd #AIsafety

sahar_abdelnabi's tweet photo. 📢📢
I'm expanding my group at the ELLIS Institute Tübingen and the Max Planck Institute for Intelligent Systems and hiring 3-4 PhD students, and also visiting PhD students for funded 3-6 months stays ✨💥

Research areas broadly span the safety, security, and trustworthiness of AI agents including:
• Sycophancy, decision-making, and societal implications of AI
• Autonomous agents and memory
• Multi-agent safety, security, and privacy
• Training dynamics of LLMs
• AI control and oversight
• AI for science

I'm looking for:
- Visiting PhD students with a strong computer science background and research experience.
- Students with a Master's degree and demonstrated research experience.

What we offer in Tübingen: amazing compute, collaborators, environment, and a lovely town!

If you're interested, please fill out the Google form on my website: https://t.co/oCryUUIIUR

@ELLISInst_Tue @MPI_IS

#phd #AIsafety

4

302

45

186

28K

mtutek retweeted

Michael Hahn @mhahn29

16 days ago

Can one reverse-engineer a transformer from just its behavior? We show it's provably possible for individual attention heads.

1

27

4

9

4K

mtutek retweeted

Aayush Mishra @aamixsh

17 days ago

NLAs are claimed to verbalize model activations. But can they faithfully interpret steered activations? In our latest paper, we show that steering moves activations into non-invertible regions; and almost surely, no prompt maps to steered activations! NLAs fail to interpret steered activation states faithfully, supporting our results! ↓ @anqi_liu33 @DanielKhashabi https://t.co/EANMNuQ1rL

aamixsh's tweet photo. NLAs are claimed to verbalize model activations. But can they faithfully interpret steered activations?

In our latest paper, we show that steering moves activations into non-invertible regions; and almost surely, no prompt maps to steered activations!

NLAs fail to interpret steered activation states faithfully, supporting our results! ↓

@anqi_liu33 @DanielKhashabi

https://t.co/EANMNuQ1rL

19

606

100

553

87K

Martin Tutek

@mtutek

19 days ago

@NitCal The best viewpoint I've seen on this. I'd even be more blunt and say that it is just disrespectful to hand over AI generated content which you have not verified. Use AI tools, but check their outputs and respect others' time.

1

5

0

172

mtutek retweeted

Nitay Calderon

@NitCal

19 days ago

My advisor always says time is our most valuable resource, I tell students I teach/work with that I dont plan to spend more time reading something than the author spent writing it. I support arXiv's decision. Asking authors to polish AI-generated content is a *VERY* low bar.

5

106

4

6

5K

mtutek retweeted

Thomas G. Dietterich @tdietterich

20 days ago

Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/

140

6K

917

1K

1M

mtutek retweeted

Harry Mayne

@HarryMayne5

23 days ago

To be presented at ICML! 🇰🇷

1

21

3

8

3K

mtutek retweeted

Kerem Zaman

@KeremZaman3

24 days ago

excited to share that our paper on CoT faithfulness has been accepted to #ACL2026 main! 🎉 check out the thread for details 👇

6

37

4

10

4K

mtutek retweeted

Emanuele Rodolà

@EmanueleRodola

25 days ago

i ran an analysis of EU-affiliated papers at ICLR 2026. the picture is less negative than the original (quoted) treemap may suggest, but still, Europe needs to wake up. 1,059 out of 5,355 papers have at least one EU affiliation. that's a 19.78% incidence on all accepted papers. If we count each paper once for every EU country represented among its authors, the total rises to 1,230 paper-country incidences. Germany leads by a large margin, followed by France and Italy (daje). as an Italian, i'm genuinely happy to see Italy so high in the ranking, our investments in national talent is paying off. this seems to correlate with the acceptance rates in competitive funding schemes like the ERC grants. i'd be curious to know whether similar patterns also appear in other non-AI fields, happy to hear thoughts from my @yacadeuro colleagues on this. also, @EU_Commission, feel free to use the data. personally i find it great to see Poland growing so strongly; there is real talent there. Austria is also becoming increasingly visible, helped by new, forward-looking institutions. i'll rerun this analysis for ICML in a couple of months. curious to see whether the same pattern holds. edit: fixed Austria counting.

EmanueleRodola's tweet photo. i ran an analysis of EU-affiliated papers at ICLR 2026. the picture is less negative than the original (quoted) treemap may suggest, but still, Europe needs to wake up.

1,059 out of 5,355 papers have at least one EU affiliation. that's a 19.78% incidence on all accepted papers.

If we count each paper once for every EU country represented among its authors, the total rises to 1,230 paper-country incidences.

Germany leads by a large margin, followed by France and Italy (daje). as an Italian, i'm genuinely happy to see Italy so high in the ranking, our investments in national talent is paying off.

this seems to correlate with the acceptance rates in competitive funding schemes like the ERC grants. i'd be curious to know whether similar patterns also appear in other non-AI fields, happy to hear thoughts from my @yacadeuro colleagues on this.

also, @EU_Commission, feel free to use the data.

personally i find it great to see Poland growing so strongly; there is real talent there. Austria is also becoming increasingly visible, helped by new, forward-looking institutions.

i'll rerun this analysis for ICML in a couple of months. curious to see whether the same pattern holds.

edit: fixed Austria counting.

8

120

17

43

18K

mtutek retweeted

Adi Simhi @AdiSimhi

about 1 month ago

Our work has been accepted at ICML 2026!🎉 Ever wondered how an LLM's conversational past influences its future, both mechanistically and behaviorally? We found that conversational history creates a "geometric trap" in the model's latent space!

1

61

4

37

6K

Martin Tutek

@mtutek

about 1 month ago

@gsarti_ One step closer to having a local analog LM running on an abacus

0

1

0

42

mtutek retweeted

Sarah Wiegreffe @sarahwiegreffe

about 1 month ago

Ever wondered what pathways steering vectors work through to affect model outputs, or whether different steering methods are functionally similar? We also find that steering vectors can be sparsified *a lot*. Check out our recent preprint, led by Stephen: https://t.co/lnMSbOz7hR

1

65

8

37

8K

Martin Tutek

@mtutek

about 1 month ago

@chengyun01 @icmlconf @AdiSimhi Ooh very interesting! will read & keep in mind for CR!

0

1

0

41

Martin Tutek

@mtutek

about 1 month ago

Now accepted to ICML 2026 @icmlconf #ICML! Check out our work where @AdiSimhi analyzes LLMs propensity to repeat behavior such as hallucinations, sycophancy and refusal - both behaviorally & mechanistically. I'll go to Seoul, LMK if you'll also be there!

Adi Simhi @AdiSimhi

3 months ago

How does an LLM’s past influence its future?🤔 In our new paper with @FazlBarez,@mtutek,@boknilev, Shay Cohen, we show that conversational history creates a "geometric trap" in the latent space, confining the model’s trajectory➡️making old habits e.g. hallucinations hard to break

AdiSimhi's tweet photo. How does an LLM’s past influence its future?🤔
In our new paper with @FazlBarez,@mtutek,@boknilev, Shay Cohen, we show that conversational history creates a "geometric trap" in the latent space, confining the model’s trajectory➡️making old habits e.g. hallucinations hard to break https://t.co/9dA23SSJIZ

1

110

15

77

19K

2

36

7

6

3K

mtutek retweeted

elie

@eliebakouch

about 1 month ago

Qwen first release on interpretability (qwen scope) is very interesting they use SAE features to identify what causes repetition in model outputs, then use steering to manufacture a "bad" rollout where the model repeats a lot. this gives RL a clear negative signal to learn from, since repetition barely shows up in normal rollouts so the model never gets punished for it they also use SAE features as a fingerprint for benchmarks, you look at which features each benchmark activates and compare overlap. lets you find redundancy inside a benchmark and across benchmarks without running any model. for instance 63% of GSM8K features are in MATH but only 10% the other way

eliebakouch's tweet photo. Qwen first release on interpretability (qwen scope) is very interesting

they use SAE features to identify what causes repetition in model outputs, then use steering to manufacture a "bad" rollout where the model repeats a lot. this gives RL a clear negative signal to learn from, since repetition barely shows up in normal rollouts so the model never gets punished for it

they also use SAE features as a fingerprint for benchmarks, you look at which features each benchmark activates and compare overlap. lets you find redundancy inside a benchmark and across benchmarks without running any model. for instance 63% of GSM8K features are in MATH but only 10% the other way

14

776

118

613

41K

Martin Tutek

@mtutek

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users