Amit LeVi

Verified account

@AmitLeViAI

AI researcher: Alignment & Interpretability

San Francisco, CA

Joined December 2025

182 Following

395 Followers

194 Posts

Pinned Tweet

about 1 month ago

For mechanistic interpretability researchers 🚨 So after grieving our 4.5 rejection at @icmlconf , we went back and took a closer look at the rebuttal results, and ended up finding something pretty surprising. When we look at quantization per task, the error over prompts is about 70% linearly reconstructable(like we can fix 70% of the error with steering), which is wild, because in standard 4-bit quantization it’s basically zero. Even earlier in the project, we noticed something odd: in some cases, the quantized model was actually doing better than the original model on specific tasks. That got us thinking. One hypothesis is that per-task quantization might actually be cleaning up noise in the residual stream for that task. And more interestingly, the direction of the error doesn’t seem random, it looks like it changes gradually along the task direction, kind of like a “correct → incorrect” direction in the prefill stage. Still early, but it feels like there’s something real going on here. What do you think?

about 1 month ago

My work was rejected with a ~Spotlight score @icmlconf 😅 I strongly believe in applied interpretability. It’s much more than just steering, we need more research in applied interpretability for reasoning, distillation, quantization, alignment, evaluation, pre/post-training, and continual learning. 🚨 If you’re a researcher in these domains, I’d be happy to discuss, you also might find our paper interesting. We propose per-task quantization. Using interpretability methods, we identify important features in the residual stream and allocate bits per layer accordingly, giving more to layers that matter most for the target task. So far, we achieve SOTA results on per-task quantization.

AmitLeViAI's tweet photo. My work was rejected with a ~Spotlight score @icmlconf 😅

I strongly believe in applied interpretability. It’s much more than just steering, we need more research in applied interpretability for reasoning, distillation, quantization, alignment, evaluation, pre/post-training, and continual learning.

🚨 If you’re a researcher in these domains, I’d be happy to discuss, you also might find our paper interesting.

We propose per-task quantization. Using interpretability methods, we identify important features in the residual stream and allocate bits per layer accordingly, giving more to layers that matter most for the target task. So far, we achieve SOTA results on per-task quantization.

8

121

5

48

45K

3

72

2

85

28K

AmitLeViAI retweeted

Hadas Orgad @OrgadHadas

18 days ago

Excited that our paper on Actionable Interpretability got accepted to ICML! And just in time -- we also heard that our Actionable Interpretability workshop will be happening again, in COLM! See you in Korea 🇰🇷 and SF🌉 [Arxiv paper link in the comment]

4

163

20

73

15K

25 days ago

@DianboLiu https://t.co/Omur4bOLua

26 days ago

Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025 🥇 Singapore 🇸🇬 🥈 Switzerland 🇨🇭 🥉 Israel 🇮🇱

AmitLeViAI's tweet photo. Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025

🥇 Singapore 🇸🇬
🥈 Switzerland 🇨🇭
🥉 Israel 🇮🇱 https://t.co/73ji8MlWk6

1

19

0

4

14K

0

0

0

0

224

26 days ago

@mareksuppa I just deleted the code this morning 😅 but it’s taking Max 5 min with Claude code

0

0

0

0

65

26 days ago

I liked it, so I extended the analysis to NeurIPS, ICLR, and ICML 2025 including acceptance rates for ICLR, accepted papers per capita, and additional analyses. The calculation uses 1/K credit per paper author, where K is the number of authors on the paper.

AmitLeViAI's tweet photo. I liked it, so I extended the analysis to NeurIPS, ICLR, and ICML 2025

including acceptance rates for ICLR, accepted papers per capita, and additional analyses.

The calculation uses 1/K credit per paper author, where K is the number of authors on the paper. https://t.co/ckW0SkrMlg

8

34

5

26

8K

26 days ago

@mareksuppa This data is public from the conferences online , I just vibe-coded it,

1

0

0

0

168

26 days ago

Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025 🥇 Singapore 🇸🇬 🥈 Switzerland 🇨🇭 🥉 Israel 🇮🇱

AmitLeViAI's tweet photo. Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025

🥇 Singapore 🇸🇬
🥈 Switzerland 🇨🇭
🥉 Israel 🇮🇱 https://t.co/QPwRfwuiAC

0

4

0

0

685

26 days ago

Full analysis

26 days ago

I liked it, so I extended the analysis to NeurIPS, ICLR, and ICML 2025 including acceptance rates for ICLR, accepted papers per capita, and additional analyses. The calculation uses 1/K credit per paper author, where K is the number of authors on the paper.

AmitLeViAI's tweet photo. I liked it, so I extended the analysis to NeurIPS, ICLR, and ICML 2025

including acceptance rates for ICLR, accepted papers per capita, and additional analyses.

The calculation uses 1/K credit per paper author, where K is the number of authors on the paper. https://t.co/ckW0SkrMlg

8

34

5

26

8K

0

1

0

1

663

26 days ago

Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025 🥇 Singapore 🇸🇬 🥈 Switzerland 🇨🇭 🥉 Israel 🇮🇱

AmitLeViAI's tweet photo. Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025

🥇 Singapore 🇸🇬
🥈 Switzerland 🇨🇭
🥉 Israel 🇮🇱 https://t.co/73ji8MlWk6

26 days ago

Per capita

AmitLeViAI's tweet photo. Per capita https://t.co/54av6w7Q0o

0

5

0

1

3K

1

19

0

4

14K

26 days ago

@Ofirlin 3.5 min, and all here

26 days ago

Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025 🥇 Singapore 🇸🇬 🥈 Switzerland 🇨🇭 🥉 Israel 🇮🇱

AmitLeViAI's tweet photo. Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025

🥇 Singapore 🇸🇬
🥈 Switzerland 🇨🇭
🥉 Israel 🇮🇱 https://t.co/73ji8MlWk6

1

19

0

4

14K

1

1

0

0

168

27 days ago

I’m curious how many submissions came from each country

Konstantin Dobler

@konstantdobler

28 days ago

@Hesamation Better version without arbitrary institution cutoff, some data cleaning and splitting contribution of each paper among institutions. China + USA dominant ofc, but looks a bit different, doesn't it?

konstantdobler's tweet photo. @Hesamation Better version without arbitrary institution cutoff, some data cleaning and splitting contribution of each paper among institutions. China + USA dominant ofc, but looks a bit different, doesn't it? https://t.co/GFXpDdYXBU

6

311

62

113

95K

2

8

0

0

4K

26 days ago

@konstantdobler @Hesamation Per capita

26 days ago

Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025 🥇 Singapore 🇸🇬 🥈 Switzerland 🇨🇭 🥉 Israel 🇮🇱

AmitLeViAI's tweet photo. Per capita @icmlconf X @NeurIPSConf X @iclr_conf — 2025

🥇 Singapore 🇸🇬
🥈 Switzerland 🇨🇭
🥉 Israel 🇮🇱 https://t.co/73ji8MlWk6

1

19

0

4

14K

0

0

0

0

665

26 days ago

Acceptance rate (ICLR 2025)

AmitLeViAI's tweet photo. Acceptance rate (ICLR 2025) https://t.co/SPGrSkl8QP

0

7

0

1

769

26 days ago

Over all

AmitLeViAI's tweet photo. Over all https://t.co/RG8t5o29D9

0

5

0

1

837

26 days ago

Per capita

AmitLeViAI's tweet photo. Per capita https://t.co/54av6w7Q0o

0

5

0

1

3K

27 days ago

@liranringel You can check my Google Scholar and figure it out

0

1

0

0

147

28 days ago

Updates*

StAJect0r @StAJect0r

28 days ago

@AmitLeViAI So any status update? I

0

1

0

0

549

0

0

0

0

472

AmitLeViAI retweeted

Ravid Shwartz Ziv

29 days ago

Some random thoughts about hosting people on our podcast and good managers. Sometimes I reach out to founders/Researchers about coming on to the podcast, and they say "we're building something right now, let's talk in a few months." I get it if you don't have time (even though it's only 90 minutes :), but I think the idea that you should only come talk after you've released a model/product is wrong. Waiting until you ship to say "I have a good product, let's talk about it!" is just one reason to come on. There are plenty of others - you want people to hear about your company, you want to recruit new researchers, you're so excited you can't stop yourself (it is exciting!). As the host, and I think this goes for our audience too, what we really want to know is the why. The thinking, the process, the difficulties, how you see the future. The best episodes weren't the ones where people came on and said "I built this product / trained this model, here's how it works." The best ones were where people talked about the process itself and gave us insight into what doing AI research actually looks like these days. And honestly, I think the same is true when you work with your employees. The bottom line matters, but the process matters even more. When you decide on a new direction or a pivot, when you're doing a reorg or changing what people work on - don't just tell them at the end. Share it along the way. You'll get much better feedback too. Anyway, come talk on our podcast!

3

21

2

3

2K

29 days ago

Presented earlier this year at AAAI 2026 Oral Paper

AmitLeViAI's tweet photo. Presented earlier this year at AAAI 2026 Oral Paper https://t.co/9xIqnjOdNi

0

2

0

0

253

29 days ago

I’m so excited! Silences Biases was just selected as an honorable mention for the: 🏆Israel National Al Safety Research Prize 🏆 Current fairness benchmarks can create a false sense of fairness by treating refusal answers (e.g., “I can’t answer that”) in bias-focused multiple-choice questions as the fairest response, even when biased preferences still exist internally.

AmitLeViAI's tweet photo. I’m so excited! Silences Biases was just selected as an honorable mention for the:

🏆Israel National Al Safety Research Prize 🏆

Current fairness benchmarks can create a false sense of fairness by treating refusal answers (e.g., “I can’t answer that”) in bias-focused multiple-choice questions as the fairest response, even when biased preferences still exist internally.

3

9

0

1

803

29 days ago

https://t.co/aNCLbBzBs0

0

1

0

1

225

29 days ago

We also perform introspection analysis showing that models still contain biased and stereotypical internal representations, even when these biases are concealed by refusal mechanisms during explicit preference questions.

0

0

0

0

239

29 days ago

We call these hidden preferences silenced biases, biases encoded in the model’s latent space but masked by safety alignment. To uncover them, we propose the Silenced Bias Benchmark (SBB), which uses activation steering to reduce refusals during evaluation and expose hidden biases.

1

0

0

0

283

Last Seen Users on Sotwe

Trends for you

Most Popular Users