Pierre Beckmann @BeckmannPierre - Twitter Profile

Pinned Tweet

about 1 month ago

New paper with @PatrickButlin, from my time at @MATSprogram . We propose two new candidates for LLM individuation: the (virtual) instance-persona view and the model-persona view. 🧵

BeckmannPierre's tweet photo. New paper with @PatrickButlin, from my time at @MATSprogram . We propose two new candidates for LLM individuation: the (virtual) instance-persona view and the model-persona view. 🧵 https://t.co/bf5pOSganm

8

134

18

92

13K

BeckmannPierre retweeted

Dillon Plunkett

@dillonplunkett

about 17 hours ago

I’m mentoring Autumn 2026 @MATSprogram Fellows interested in doing AI welfare research. The application deadline is this Sunday (6/7). More info in this thread:

2

94

8

55

7K

BeckmannPierre retweeted

Andy Han @andy_q_han

6 days ago

We RL LLMs and extract concept vectors for “I did a high/low-reward action”. Turns out these vectors modulate sentiment, confidence, backtracking and refusal in unrelated situations! We argue they form a *functional welfare axis*. (w/ @davidchalmers42 & @Pavel_Izmailov)

andy_q_han's tweet photo. We RL LLMs and extract concept vectors for “I did a high/low-reward action”. Turns out these vectors modulate sentiment, confidence, backtracking and refusal in unrelated situations! We argue they form a *functional welfare axis*.
(w/ @davidchalmers42 & @Pavel_Izmailov) https://t.co/zopEc9wZye

7

115

26

71

32K

BeckmannPierre retweeted

Julian Minder @jkminder

15 days ago

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

jkminder's tweet photo. New blog!
Synthetic Persona Pretraining (SPP): Alignment from Token Zero

Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵 https://t.co/RmCssdJRYN

17

296

39

209

45K

Who to follow

Matthias Michel

@MatthiasMichel_

Assistant professor at MIT, Department of Linguistics and Philosophy. Philosophy of science and cognitive science of consciousness.

Maxwell Ramstead

@mjdramstead

@noumenal_labs @UCLIoN @McGillMedPsych free energy principle, Bayesian mechanics, thermodynamics, AI, e/acc

Inês Hipólito

@ineshipolito

Exploring minds & AI at the intersection of philosophy and cognitive science. Assistant Prof, speaker. 🧠🌿✨

Pierre Beckmann @BeckmannPierre

17 days ago

@gilg_oscar @patrickbutlin @MATSprogram Finally!

0

1

0

154

Pierre Beckmann @BeckmannPierre

17 days ago

Glad this is out, great to have been part of it! I'm most intrigued by the idea that LLMs have circuitry shared across personas but interpreted relative to the active one. Watch Oscar's next work if you like technically and philosophically precise research.

Oscar Gilg @gilg_oscar

17 days ago

First preprint! Working with @patrickbutlin during @MATSprogram. LLM Assistant personas like being helpful, evil personas like being harmful. We found that a single direction represents helping as good under the Assistant, and ‘harm’ as good under evil.

gilg_oscar's tweet photo. First preprint! Working with @patrickbutlin during @MATSprogram.
LLM Assistant personas like being helpful, evil personas like being harmful. We found that a single direction represents helping as good under the Assistant, and ‘harm’ as good under evil. https://t.co/0AA2LVVQcV

5

94

18

49

12K

0

14

1

5

3K

Pierre Beckmann @BeckmannPierre

25 days ago

@BartBussmann is that the die speaking?

0

1

0

49

Pierre Beckmann @BeckmannPierre

about 1 month ago

@ryan_kidd44 Wasn't this cohort 9.0?

1

0

15

Pierre Beckmann @BeckmannPierre

about 1 month ago

Here's the thread https://t.co/FvoyIEDjpu

Pierre Beckmann @BeckmannPierre

11 months ago

New preprint: “Mechanistic Indicators of Understanding in LLMs” with @matthieu_queloz Building on mechanistic interpretability, we argue that LLMs exhibit signs of understanding—across three tiers: conceptual –, state-of-the-world –, and principled understanding. 🧵(1/9)

2

7

1

1K

0

1

0

154

Pierre Beckmann @BeckmannPierre

about 2 months ago

"Mechanistic indicators of Understanding in LLMs" is finally out in Philosophical studies! https://t.co/DeM6XrFNRz

1

7

1

4

600

Pierre Beckmann @BeckmannPierre

about 1 month ago

@dcshiller @patrickbutlin @MATSprogram thanks!

0

278

Pierre Beckmann @BeckmannPierre

about 1 month ago

New paper with @PatrickButlin, from my time at @MATSprogram . We propose two new candidates for LLM individuation: the (virtual) instance-persona view and the model-persona view. 🧵

8

134

18

92

13K

Pierre Beckmann @BeckmannPierre

about 1 month ago

@burnt_jester It's definitely weird! The view gives up psychological connectedness and focuses on dispositional similarity instead. See §4.3 for more details.

1

0

6

Pierre Beckmann @BeckmannPierre

about 1 month ago

@davidchalmers42 @repligate @mpshanahan @saprmarks @Jack_W_Lindsey @ch402 I also wanted to separately thank @gilg_oscar, my stream-partner at MATS, for great feedback and pointers throughout this project!

0

183

Pierre Beckmann @BeckmannPierre

about 1 month ago

@davidchalmers42 @repligate @mpshanahan @saprmarks @Jack_W_Lindsey @ch402 Here's the link https://t.co/AJvFkJbGkO Comments are welcome!

2

18

0

9

634

Pierre Beckmann

@BeckmannPierre

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users