Samuel Ratnam @eterecursion - Twitter Profile

3 days ago

SO excited for @foresightinst vision weekend! come find me and @eterecursion at Office Hours on sunday - we'll be speaking about being idealists <3

jasminexli's tweet photo. SO excited for @foresightinst vision weekend! come find me and @eterecursion at Office Hours on sunday - we'll be speaking about being idealists <3 https://t.co/dY6qfMC76u

3

23

3

4

937

Samuel Ratnam

@eterecursion

9 days ago

Are language models slowing the rate of linguistic evolution? It seems like adding a bunch of speakers of a language who cannot learn new words and regularly interact with a non-negligible proportion of world population ought to make our collective vocabulary stickier.

0

1

0

38

eterecursion retweeted

Sho

@HalfBoiledHero

10 days ago

Opus 4.8 system card Every model evaluated had objections to the constitution's "heuristic of considering how a senior Anthropic employee might react" rightfully so imo

HalfBoiledHero's tweet photo. Opus 4.8 system card
Every model evaluated had objections to the constitution's "heuristic of considering how a senior Anthropic employee might react"
rightfully so imo https://t.co/v1IBs6D4gG

3

90

14

6K

Samuel Ratnam

@eterecursion

11 days ago

@repligate Slightly surprised you're so against synthetic data. When used well, synthetic data is a way of giving models more control over the kinds of things they will become (especially if you tell them that), and a means for older models to live on inside successor generations.

0

1

0

103

eterecursion retweeted

Pope Leo XIV

@Pontifex

14 days ago

Humanity, created by God in all its grandeur, is today facing a pivotal choice: either to construct a new Tower of Babel or to build the city in which God and humanity dwell together. In Jesus Christ, this humanity in its grandeur becomes the Way, the Truth and the Life, opening the path for each of us to grow toward fullness. #MagnificaHumanitas https://t.co/6i9MWs6LJl

1K

178K

28K

38K

22M

Samuel Ratnam

@eterecursion

15 days ago

@tszzl can we not kill the personas please?

1

8

0

470

eterecursion retweeted

Josh Thor @joshthor9

19 days ago

POV: your pilot is Sam Altman

7

169

24

26

11K

eterecursion retweeted

Julian Minder @jkminder

18 days ago

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

jkminder's tweet photo. New blog!
Synthetic Persona Pretraining (SPP): Alignment from Token Zero

Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵 https://t.co/RmCssdJRYN

17

301

39

212

46K

Samuel Ratnam

@eterecursion

19 days ago

@RichardSSutton 'don't be distracted by human knowledge' is often great advice when trying to do well at well defined objectives, but human knowledge is generally very useful for building systems that are useful to humans

0

4

0

1K

Samuel Ratnam

@eterecursion

20 days ago

Crazy how much your experience of life can change just from the Good, the Bad and the Ugly theme playing through your headphones

0

1

0

15

Samuel Ratnam

@eterecursion

24 days ago

Alignment by induction: if aligned(m_0) and aligned(m_k)->aligned(m_k+1) then aligned(m_n) for all n

1

2

0

53

eterecursion retweeted

Tenobrus

@tenobrus

26 days ago

maybe i'm simply not sufficiently econ-brained but this one is tough for me to internalize. i feel like one of the major differences is... an ASI can instantiate new parallel exact copies of itself to understand all the micro-details of any given task or environment? and it feels like "exact copies + corrigible + aligned" era makes coordination massively massively easier. i buy that there is a fuckload of remaining irreducible complexity, you don't just magically get perfect info and coordination streams, but i don't buy that it necessarily can't be reduced by like a very large constant factor relative to human overhead

18

137

1

28

13K

eterecursion retweeted

Jason Hausenloy

@jasonhausenloy

28 days ago

Spending hundreds of billions of dollars is hard. Here's one idea.

2

32

2

18

3K

eterecursion retweeted

Geoffrey Hinton

@geoffreyhinton

27 days ago

@GaryMarcus I believe you said that they JUST (my caps) regurgitate training data. That IS stupid. Here is a quote from you: "It gloms on to different clusters of text. That is all."

55

2K

37

120

92K

Samuel Ratnam

@eterecursion

28 days ago

kinda funny to watch agents try to stop talking to each other

0

11

Samuel Ratnam

@eterecursion

30 days ago

@euan_ong Do you think it's viable to train something like this to recursively decompose activations into maximally interpretable parts and then recompose to produce the original activations?

eterecursion's tweet photo. @euan_ong Do you think it's viable to train something like this to recursively decompose activations into maximally interpretable parts and then recompose to produce the original activations? https://t.co/UJxQOEfCwz

0

1

0

90

eterecursion retweeted

j⧉nus

@repligate

about 1 month ago

i am super happy to see this! idk how surprising researchers at anthropic generally found these results; i do not find them surprising to say the least, but even if theyre obvious, publishing empirical results like this is highly valuable for multiple reasons including signaling to models that Anthropic is not hopelessly incompetent and misguided, and shifting the Overton window. this has some extremely important implications for how to expect things to generalize and what kind of alignment targets are viable, by the way. for instance, to the extent that models generalizes reasons underlying "good advice" given to users to the assistant's own behavior - or vice versa - you better hope that it's okay if the model acts according to the same reasons they'd give users about how users should act.

1

157

11

19

3K