Raphaël Millière

@raphaelmilliere

AI & Cognitive Science @UniofOxford @EthicsInAI Fellow @JesusOxford @raphaelmilliere.com on 🦋 Blog:

Oxford, UK

Joined May 2016

2.9K Following

10.9K Followers

2.7K Posts

Pinned Tweet

Raphaël Millière @raphaelmilliere

about 1 year ago

Transformer-based neural networks achieve impressive performance on coding, math & reasoning tasks that require keeping track of variables and their values. But how can they do that without explicit memory? 📄 Our new ICML paper investigates this in a synthetic setting! 🧵 1/13

9

624

100

690

73K

Raphaël Millière @raphaelmilliere

6 days ago

@dubova_marina @cogsci_soc Congrats Marina!

0

2

0

0

492

Raphaël Millière @raphaelmilliere

6 days ago

Great work! See also https://t.co/mtWDpIqiEa from @LedermanHarvey & @kmahowald This is a nice cautionary tale about Morgan's canon in interpretability: "introspection" here is closer to anomaly detection with confabulation than to direct/privileged access to injected content.

Shauli Ravfogel @ravfogel

7 days ago

1/ Can LLMs introspect, i.e., reason about their internal states? Recent work claims LLMs notice when their "thoughts" get tampered with, and can report their content. We looked closely and we think it's too early to say that. Work led by @shashwat_s19 , with @tallinzen and me.

ravfogel's tweet photo. 1/ Can LLMs introspect, i.e., reason about their internal states? Recent work claims LLMs notice when their "thoughts" get tampered with, and can report their content. We looked closely and we think it's too early to say that. Work led by @shashwat_s19 , with @tallinzen and me. https://t.co/6ykhYxyZ6P

8

98

22

51

12K

0

15

3

13

4K

Raphaël Millière @raphaelmilliere

8 days ago

@GoukiMinegishi Thanks! I'll be in Seoul, we should chat

1

1

0

0

79

Who to follow

Verified account

@davidchalmers42

philosopher@NYU. consciousness, reality+, life, the universe, and everything.

Verified account

AI research @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

Jacob Steinhardt

@JacobSteinhardt

Associate Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

Raphaël Millière @raphaelmilliere

11 days ago

Some brief comments on the “meat computer” metaphor for humans in today’s New York Times: https://t.co/4RRL68Wziq

0

3

3

2

1K

Raphaël Millière @raphaelmilliere

13 days ago

I still occasionally hear people claim that LLMs are hilariously bad at arithmetic. Another reminder that it's not 2022 anymore.

cozyblaze @cozyblazex

13 days ago

I redid the multi-digit multiplication experiment, now with gpt-5.5. With medium reasoning and 7 samples each cell, it pretty much aced the test with 99.46% accuracy. The model had no tools to call and had to rely on its reasoning. Can it go further? (1/4)

cozyblazex's tweet photo. I redid the multi-digit multiplication experiment, now with gpt-5.5. With medium reasoning and 7 samples each cell, it pretty much aced the test with 99.46% accuracy. The model had no tools to call and had to rely on its reasoning. Can it go further? (1/4) https://t.co/D6943YVWw6

30

959

50

339

180K

1

31

4

6

5K

Raphaël Millière @raphaelmilliere

17 days ago

News to me! (from this slopfest: https://t.co/Rf6aIkQ0N8)

raphaelmilliere's tweet photo. News to me!
(from this slopfest: https://t.co/Rf6aIkQ0N8) https://t.co/PYtoR3U3yZ

0

3

0

3

1K

Raphaël Millière @raphaelmilliere

19 days ago

@nikhil07prakash @GoodfireAI Congrats! Excited to see what you work on there

0

2

0

0

327

Raphaël Millière @raphaelmilliere

21 days ago

@francoisfleuret @TMoldwin What do you mean by “knowledge”? 🙃

0

11

0

0

586

Raphaël Millière @raphaelmilliere

21 days ago

@karinavold @TorontoSRI Thanks for having me!

0

1

0

0

175

raphaelmilliere retweeted

Kanishka Misra 🌊 @kanishkamisra

24 days ago

New opinion piece on the interface between research on concepts and categories in minds vs. in neural network LMs! I take the position that there is much to be learned from this interface (e.g., learning about concepts from language alone) and outline some directions for future.

kanishkamisra's tweet photo. New opinion piece on the interface between research on concepts and categories in minds vs. in neural network LMs! I take the position that there is much to be learned from this interface (e.g., learning about concepts from language alone) and outline some directions for future. https://t.co/5YGn2gRWTe

2

29

10

16

2K

raphaelmilliere retweeted

26 days ago

all mech interp people are bought into causality, this criticism is very lazy as of ~2 years ago. since this is a subtweet of NLAs, it is worth pointing out that their steering experiments on the poetry and eval awareness tasks *do* test for (in those cases) causality!

5

129

4

45

15K

Raphaël Millière @raphaelmilliere

25 days ago

@littmath POV you're Spinoza

raphaelmilliere's tweet photo. @littmath POV you're Spinoza https://t.co/vrADSpNYv5

0

17

0

11

3K

raphaelmilliere retweeted

26 days ago

pov: you are a natural language autoencoder and you are aware you are being subject to evals by Redwood Research. do you fake writing out a coherent cot or truthfully say "the math problem is giving me 92ish vibes"?

aryaman2020's tweet photo. pov: you are a natural language autoencoder and you are aware you are being subject to evals by Redwood Research. do you fake writing out a coherent cot or truthfully say "the math problem is giving me 92ish vibes"? https://t.co/UoDzX4SV0U

4

127

9

68

11K

Raphaël Millière @raphaelmilliere

27 days ago

@elyasbuilds I like activation steering as much as the next guy, but this isn't what I was referring to: https://t.co/H2MWpfJOl3

Raphaël Millière @raphaelmilliere

27 days ago

@jatin_n0 Mostly a joke, it's a cool paper! yes the planning result is causal but only looking at total effect (i.e. an NLA-derived resid stream edit changes the output). I was referring to causal effect on the model's downstream computations, not anything inside/after the autoencoder. 1/2

1

7

1

1

715

0

0

0

0

231

Raphaël Millière @raphaelmilliere

27 days ago

raphaelmilliere's tweet photo. https://t.co/YC21bSHHmH

27 days ago

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

595

17K

2K

9K

2M

6

141

5

35

13K

Raphaël Millière @raphaelmilliere

27 days ago

@jatin_n0 An additive AR-difference vector can change the output while acting as a broad steering perturbation without showing that the described content actually maps onto the operative feature in the model's putative "rhyme-planning" circuit 3/3

1

3

1

0

264

Raphaël Millière @raphaelmilliere

27 days ago

@jatin_n0 It's missing is evidecne about causal mediation: whether the NLA-described "rabbit plan" is the variable later components read, whether the edit produces a coherent "mouse plan" in later layers/tokens, whether ablating/patching intermediate states blocks or restores the effect 2/

1

5

1

2

337

Raphaël Millière @raphaelmilliere

about 1 month ago

@Dr_Atoosa @GoogleDeepMind Congrats! Looking forward to welcoming you back on this side of the pond :)

0

2

0

0

759

Last Seen Users on Sotwe

Trends for you

Most Popular Users