Cameron Holmes @cameronholmes92 - Twitter Profile

Pinned Tweet

10 months ago

Incredible work by 3x @MATSprogram alumni and a great example of applied Mech Interp beating black box baselines and making significant progress on critical real-world problems:

Oscar Balcells Obeso @OBalcells

10 months ago

Imagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.

202

9K

607

4K

747K

2

30

3

3K

Cameron Holmes @CameronHolmes92

3 days ago

@clippocampus @AnthropicAI Awesome, congratulations! 🎉

0

1

0

92

Cameron Holmes @CameronHolmes92

3 days ago

@dab_chick @Benthamsbulldog All good, it's my bad for missing the joke. Dw from your profile I assumed there wasn't any cruxy disagreement here

0

1

0

35

Cameron Holmes @CameronHolmes92

3 days ago

@dab_chick @Benthamsbulldog The average consumer's expected impact is still ~1 because they have imperfect knowledge. The expected curve is effectively smooth for this reason.

1

5

0

95

Who to follow

Hugo Elias

@servo_chignon

generally contrary to popular belief

EA Paddington 🍊🔸

@ea_paddington

Making the world a better place, one marmalade sandwich at a time. This century seems to be most important. If we are kind and polite, the world will be right.

Aidan O’Gara

@aidanogara_

Aligning the technocapital machine. Doctoral student in AI at Oxford and grantmaker at Longview.

Cameron Holmes @CameronHolmes92

3 days ago

@AaronBergman18 @jjspicer I can't say this has happened to me, but I was giddy when someone recognised me from Manifold

1

2

0

28

CameronHolmes92 retweeted

Benno Sturgeon

@ben_sturgeon

9 days ago

When Role-Playing, Do Models Believe What They Say? (w/ @DavidDAfrica and @realmeatyhuman) LLMs can say “The Earth revolves around the Sun” and then, when roleplaying as an ancient Greek historian, assert the opposite. What changes inside the model when it acts like this? Does it just say things, or does it start to believe the role? 🧵

ben_sturgeon's tweet photo. When Role-Playing, Do Models Believe What They Say? (w/ @DavidDAfrica and @realmeatyhuman)

LLMs can say “The Earth revolves around the Sun” and then, when roleplaying as an ancient Greek historian, assert the opposite.

What changes inside the model when it acts like this? Does it just say things, or does it start to believe the role? 🧵

8

167

29

108

282K

Cameron Holmes @CameronHolmes92

9 days ago

@AndrewDraganov Harlan Ellison: in my book I invented the non-verbal sentience as a cautionary tale

0

32

Cameron Holmes @CameronHolmes92

9 days ago

@dioscuri @gleech Agreed, Understand helped me grok some superintelligence concerns

0

1

0

55

Cameron Holmes @CameronHolmes92

10 days ago

@frances__lorenz FAR 🤝 AISI Alignment Wearing pink on Wednesdays

1

3

0

129

Cameron Holmes @CameronHolmes92

12 days ago

@peterwildeford "Last to take their hand off the light cone, keeps it!"

0

7

0

956

Cameron Holmes @CameronHolmes92

16 days ago

@wilhelmscreamin @deepfates @ArcadiaImpact Alignment team are also looking at model motivations https://t.co/HHiUrHED8N https://t.co/wlMOG1dF6r

0

2

0

41

CameronHolmes92 retweeted

David @DavidDAfrica

16 days ago

Model organisms are useful insofar as they are “scary property + normal model." Right now, many current organisms are more like “scary property + fried model” In this post, we argue for more natural MOs: models that get the pathology without becoming otherwise fried!

0

36

3

11

4K

Cameron Holmes @CameronHolmes92

17 days ago

@replyallguy @_sholtodouglas For the purpose of the comparison I think it's fair to consider saving lives more cheaply than Value of a Statistical Life as a saving/revenue analogue.

1

0

11

Cameron Holmes @CameronHolmes92

17 days ago

@herbiebradley Looks like Xenon poisoning, we should be cautious when removing control rods.

0

2

0

108

Cameron Holmes @CameronHolmes92

19 days ago

@DanielCHTan97 @ben_sturgeon Oh I meant just to themselves or without any expectation of feedback, kind of an extension of imitation really. I think they basically say some stuff, then are sort of self-checking if that's roughly in distribution.

0

23

Cameron Holmes @CameronHolmes92

19 days ago

@DanielCHTan97 @ben_sturgeon Tangentially, I have really vivid school memories of trying to predict teachers sentences in class, right down to arbitrary choices (like using 4 as a constant or a random female name) - but I'm pretty sure that's me just being a bit weird

0

1

0

14

Cameron Holmes @CameronHolmes92

19 days ago

@DanielCHTan97 @ben_sturgeon Human conscious learning, agreed, but I think by volume most human learning looks more like toddlers imitating / rolling out high temp tokens and seeing what sticks which feels spiritually like pretraining.

2

0

41

CameronHolmes92 retweeted

Geoffrey Irving

@geoffreyirving

25 days ago

We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵

geoffreyirving's tweet photo. We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵 https://t.co/UziUGbIPdU

28

993

151

427

226K

CameronHolmes92 retweeted

Arcadia Impact @ArcadiaImpact

29 days ago

*NEW* AI alignment research team! We're announcing the new alignment team @ArcadiaImpact. A London-based team, working closely with @AISecurityInst to tackle 3 ambitious agendas in AI alignment! 👇 🧵

1

104

11

47

8K

CameronHolmes92 retweeted

Matt Clifford

@matthewclifford

about 1 month ago

The @AISecurityInst is hiring for a Director and for a Chief Research Officer. AISI is a remarkable organisation: doing globally important work, with a world-class team, in the heart of government. These are some of the highest impact jobs in AI security anywhere. Do consider applying and sharing widely.

matthewclifford's tweet photo. The @AISecurityInst is hiring for a Director and for a Chief Research Officer. AISI is a remarkable organisation: doing globally important work, with a world-class team, in the heart of government.

These are some of the highest impact jobs in AI security anywhere. Do consider applying and sharing widely.

2

124

35

30

26K

CameronHolmes92 retweeted

David @DavidDAfrica

about 1 month ago

Many methods use consistency as a way to make language models more capable or aligned, such as through self-distillation or regularisation. In new work accepted to ICML 2026, @ArathiMani and I show that optimising for self-consistency can entrench pre-existing misalignment.

DavidDAfrica's tweet photo. Many methods use consistency as a way to make language models more capable or aligned, such as through self-distillation or regularisation.

In new work accepted to ICML 2026, @ArathiMani and I show that optimising for self-consistency can entrench pre-existing misalignment. https://t.co/Fw93lPDRKJ

3

50

7

22

3K

Cameron Holmes

@CameronHolmes92

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users