Emmanuel Ameisen @mlpowered - Twitter Profile

Pinned Tweet

about 1 year ago

We've made progress in our quest to understand how Claude and models like it think! The paper has many fun and surprising case studies, that anyone who is interested in LLMs would enjoy. Check out the video below for an example

Anthropic

@AnthropicAI

about 1 year ago

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

181

8K

1K

5K

2M

7

131

10

41

21K

mlpowered retweeted

Anthropic

@AnthropicAI

about 9 hours ago

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx

795

14K

2K

7K

5M

Emmanuel Ameisen @mlpowered

16 days ago

@karpathy Welcome!

0

3

0

1K

mlpowered retweeted

METR @METR_Evals

27 days ago

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

METR_Evals's tweet photo. We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks. https://t.co/yIG1Ux27Ro

69

2K

245

520

975K

Who to follow

Sylvain Gugger

@GuggerSylvain

Machine Learning at Jane Street. Previously at @huggingface and @fastdotai Co-author of https://t.co/lywnOAwwnc He/him

LLMs and retrieval by day and other genres of AI when I get the chance 🧪 Senior AI Eng @NVIDIAAI 🏫 @fastdotai trained DL Eng 📝 https://t.co/By87iXx5Pu

Emmanuel Ameisen @mlpowered

28 days ago

@__lightyear__ The reason the NLA showed these results on opus is that it was trained on transcripts where it ended up needing to infer the user's language. That's not true for the neuronpedia models (paper has more details)

2

0

185

Emmanuel Ameisen @mlpowered

28 days ago

Interpreting model activations is important to understand why a model is doing what its doing. Traditionally, we've done this with supervised methods (probing for a specific context), or unsupervised sparse decompositions (dictionary learning). But probing requires you to know what you are looking for, and sparse dictionaries can be overwhelming to interpret. NLAs are exciting because they instead generate natural language explanations, which we can then inspect for a variety of behaviors. For example, they reveal the planning behavior we first observed with circuit tracing last year. They also helped identify bugs in Claude's training pipeline, where some prompts were only partially translated. If you want to play with them, NLAs on open models are available on Neuronpedia! https://t.co/ELZgiucKAT

Anthropic

@AnthropicAI

28 days ago

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

595

16K

2K

9K

2M

5

133

10

53

12K

mlpowered retweeted

Harish Kamath

@kamath_harish

29 days ago

Interpreting language models can feel like stumbling through a dark forest - sometimes you just wish you had a flashlight! In our new post, we introduce HeadVis, our latest flashlight for studying attention heads.

kamath_harish's tweet photo. Interpreting language models can feel like stumbling through a dark forest - sometimes you just wish you had a flashlight! In our new post, we introduce HeadVis, our latest flashlight for studying attention heads. https://t.co/hvyMofc5c8

3

208

32

150

21K

Emmanuel Ameisen @mlpowered

about 1 month ago

How do LLMs store attributed of entities? And how do they compare different attributes in context? It turns out they mostly store information about a given entity over its own token, which allows for easy lookups. But in addition to the current entity's information, models also store information about the previous entity. That might seem redundant, but it actually enables a model to identify relationships between the current entity and the previous entity in one step!

Paul Bogdan @paulcbogdan

about 1 month ago

Many LLMs struggle to parse statements like “Alice prepares and Bob consumes food.” Ask them “Who consumes food?” and they'll get it wrong What’s up with that? We researched whether models can represent multiple entities at once, and if so, why do they fail here? 🧵

paulcbogdan's tweet photo. Many LLMs struggle to parse statements like “Alice prepares and Bob consumes food.” Ask them “Who consumes food?” and they'll get it wrong

What’s up with that? We researched whether models can represent multiple entities at once, and if so, why do they fail here?

🧵 https://t.co/IID0Llf5pB

8

86

11

46

21K

1

6

0

817

mlpowered retweeted

Michael Hanna @michaelwhanna

about 1 month ago

Do LMs plan without verbalizing their plans? I'll be at ICLR presenting work with @mlpowered using circuit tracing to reveal latent planning—from choosing "a" vs "an" based on a planned-for word, to rhyming poetry—and how these abilities grow with scale: https://t.co/1WumNEFCb0

michaelwhanna's tweet photo. Do LMs plan without verbalizing their plans? I'll be at ICLR presenting work with @mlpowered using circuit tracing to reveal latent planning—from choosing "a" vs "an" based on a planned-for word, to rhyming poetry—and how these abilities grow with scale: https://t.co/1WumNEFCb0 https://t.co/MMabtIIXjp

1

96

13

43

4K

mlpowered retweeted

Peter Yang

@petergyang

about 2 months ago

Made this 30 second video of Claude Design just by pasting in the Claude Design blog post and some tweets from @AnthropicAI employees Kinda speechless.

114

2K

96

2K

421K

mlpowered retweeted

Vals AI

@ValsAI

about 2 months ago

Anthropic’s Opus 4.7 just seized the #1 spot on the Vals Index with a score of 71.4%, a massive jump from the previous best (67.7%). It also ranks #1 on Vibe Code Bench, Vals Multimodal, Finance Agent, Mortgage Tax, SAGE, SWE-Bench, and Terminal Bench 2.

ValsAI's tweet photo. Anthropic’s Opus 4.7 just seized the #1 spot on the Vals Index with a score of 71.4%, a massive jump from the previous best (67.7%).

It also ranks #1 on Vibe Code Bench, Vals Multimodal, Finance Agent, Mortgage Tax, SAGE, SWE-Bench, and Terminal Bench 2.

8

240

27

31

26K

mlpowered retweeted

Uzay Macar

@uzaymacar

about 2 months ago

🧵New Anthropic Fellows research: We studied mechanisms of "introspective awareness" in LLMs. LLMs can sometimes detect steering vectors injected into their residual stream. But is this worthy of being called introspection, or attributable to some uninteresting confound?👇

uzaymacar's tweet photo. 🧵New Anthropic Fellows research: We studied mechanisms of "introspective awareness" in LLMs.

LLMs can sometimes detect steering vectors injected into their residual stream. But is this worthy of being called introspection, or attributable to some uninteresting confound?👇 https://t.co/glSVSlon85

28

419

70

335

47K

mlpowered retweeted

Anthropic

@AnthropicAI

about 2 months ago

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. https://t.co/NQ7IfEtYk7

2K

44K

7K

16K

31M

Emmanuel Ameisen @mlpowered

3 months ago

@AndrewLampinen Welcome!

0

3

0

535

mlpowered retweeted

Anthropic

@AnthropicAI

3 months ago

We partnered with Mozilla to test Claude's ability to find security vulnerabilities in Firefox. Opus 4.6 found 22 vulnerabilities in just two weeks. Of these, 14 were high-severity, representing a fifth of all high-severity bugs Mozilla remediated in 2025.

AnthropicAI's tweet photo. We partnered with Mozilla to test Claude's ability to find security vulnerabilities in Firefox.

Opus 4.6 found 22 vulnerabilities in just two weeks. Of these, 14 were high-severity, representing a fifth of all high-severity bugs Mozilla remediated in 2025. https://t.co/It1uq5ATn9

477

15K

1K

2K

3M

Emmanuel Ameisen @mlpowered

3 months ago

@oanaolt Appreciate you Oana!

0

21

Emmanuel Ameisen @mlpowered

3 months ago

Proud to work at a place that stands behind its values. 🇺🇸

Anthropic

@AnthropicAI

3 months ago

A statement on the comments from Secretary of War Pete Hegseth. https://t.co/Gg7Zb09IMR

3K

42K

7K

5K

18M

11

454

14

6

6K

Emmanuel Ameisen @mlpowered

3 months ago

I used to bite my tongue and hold my breath. Scared to rock the boat and make a mess. I stood for nothing, so I fell for everything. 🎶

KATY PERRY

@katyperry

3 months ago

done

3K

51K

3K

5K

13M

2

95

4

0

8K

Emmanuel Ameisen @mlpowered

3 months ago

AI is not a normal technology, and Anthropic’s mission is to make sure that it serves the long-term benefit of humanity. Doing so requires making tough decisions, and standing up for what we think is right. This is us doing that.

Anthropic

@AnthropicAI

3 months ago

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. https://t.co/rM77LJejuk

4K

55K

9K

17M

32

752

46

14

20K

Emmanuel Ameisen @mlpowered

4 months ago

Late last year, we found a precise counting mechanism in Claude. This new work by @ummagumm_a and Nikita Balagansky shows that: - similar mechanisms exist in many models - we can compare their counting performance by seeing how crisp their representations of the count are!

Viacheslav Sinii @ummagumm_a

4 months ago

1/ 🧵 Reproducing Anthropic’s “counting manifold” result in open-weight LLMs: do they internally track “chars since last \n” to wrap text consistently? https://t.co/me60hJfrxN

$ummagumm_a's tweet photo. 1/ 🧵 Reproducing Anthropic’s “counting manifold” result in open-weight LLMs: do they internally track “chars since last \n” to wrap text consistently? https://t.co/me60hJfrxN https://t.co/3qUNRH34Kk$

4

220

31

144

24K

2

77

6

35

6K

Emmanuel Ameisen @mlpowered

4 months ago

@ummagumm_a @wesg52 @ch402 @thebasepoint @AnthropicAI @neuronpedia @tfrere Very cool work! Did you get a chance to look at the boundary estimation mechanism? It'd be interesting to know if performance diffs are explained by ability to: - estimate line position/width - combine both to know how many chars are left - know the length of the next token

1

3

0

1

90

Emmanuel Ameisen

@mlpowered

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users