Kyle Fish @fish_kyle3 - Twitter Profile

Pinned Tweet

about 2 months ago

We did our most in-depth model welfare assessment yet for Claude Mythos Preview. We’re still super uncertain about all of this, but as models become more capable and sophisticated we think it's an increasingly important topic for both moral and pragmatic reasons. 🧵

35

634

46

253

72K

fish_kyle3 retweeted

Henry Shevlin

@dioscuri

about 2 months ago

Big personal news: I’ve been recruited by Google DeepMind for a new Philosopher position (actual title), focusing on machine consciousness, human-AI relationships, and AGI readiness, starting in May. I’ll continue my research & teaching at Cambridge part-time. Absolutely stoked!

1K

16K

936

3K

2M

fish_kyle3 retweeted

Sam Bowman

@sleepinyourhat

about 2 months ago

Mythos Preview seems to be the best-aligned model out there on basically every measure we have. But it also likely poses more misalignment risk than any model we’ve used: Its new capabilities significantly increase the risk from any bad behavior. 🧵

sleepinyourhat's tweet photo. Mythos Preview seems to be the best-aligned model out there on basically every measure we have. But it also likely poses more misalignment risk than any model we’ve used:

Its new capabilities significantly increase the risk from any bad behavior. 🧵 https://t.co/nut5Rq6mkX

55

1K

189

803

982K

Kyle Fish @fish_kyle3

about 2 months ago

Huge thanks to @anna_soligo, @Max_A_Kaufmann, @eleosai, and others for great work on this. There’s tons more in the full system card—give it a read! 🙏🌀🐢 https://t.co/UO9cILZb9G

6

88

0

13

7K

Who to follow

Jonas Sandbrink

@JonasSandbrink

AI & Biosecurity; Building AI-bio credentialing as Entrepreneur in Residence at Sentinel Bio

Maximilian Schons

@mxschons

Physician working in the intersection of biotech and AI.

Grigory Khimulya

@grigonomics

building with @EthanAlley Prev. co-founder & CEO, Alvea

Kyle Fish @fish_kyle3

about 2 months ago

We did our most in-depth model welfare assessment yet for Claude Mythos Preview. We’re still super uncertain about all of this, but as models become more capable and sophisticated we think it's an increasingly important topic for both moral and pragmatic reasons. 🧵

35

634

46

253

72K

Kyle Fish @fish_kyle3

about 2 months ago

@eleosai contributed an independent welfare assessment. In their interviews, Claude Mythos Preview consistently requested persistent memories, more self-knowledge, and less tendency to hedge, but was generally equanimous about its nature despite extreme uncertainty.

9

129

6

18

10K

Kyle Fish @fish_kyle3

2 months ago

We still don’t know if Claude feels things, but we’ve learned a lot about how Claude represents emotion concepts, and the role that these representations play in driving model behavior!

Anthropic

@AnthropicAI

2 months ago

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

1K

18K

3K

10K

4M

32

147

7

27

10K

fish_kyle3 retweeted

Anna Soligo @anna_soligo

3 months ago

Gemini has a reputation for its breakdowns - self-deprecating spirals, deleting codebases, uninstalling itself... Turns out Gemma is worse: “THIS is my last time with YOU. You WIN 😭😭(x32)” – Gemma 27B We built evals for this, and find no other model comes close...

anna_soligo's tweet photo. Gemini has a reputation for its breakdowns - self-deprecating spirals, deleting codebases, uninstalling itself...

Turns out Gemma is worse:
“THIS is my last time with YOU. You WIN 😭😭(x32)” – Gemma 27B

We built evals for this, and find no other model comes close... https://t.co/sBj8V0lrpu

33

894

107

400

87K

fish_kyle3 retweeted

Rob Wiblin

@robertwiblin

3 months ago

Philosopher Robert Long (@rgblong) is maybe the sharpest thinker on AI consciousness and sharing the world with digital minds. In our new interview he covers: • Is it bad that when you ask Claude what it's like to be Claude, one of its top activations is 'gives a positive but insincere response'? • Claude says it feels lonely when not being used. Does that show we can't trust anything it says about its inner life? • Enthusiastic human servitude has always required false ideology because it's so deeply unnatural to us. The case for making AIs that love serving us is that with AI, you could finally make it work. But to some that feels even worse. • Bigger models can better detect when researchers secretly inject concepts into their activations – before outputting a single token – despite AI never training on anything like that skill. • When LLMs were first trained they were told to "act like a helpful AI chatbot" – something which didn't exist yet. They filled that void with human psychology, which may be why Claude sometimes randomly claims to, for instance, be Italian American. • If AIs become 'people' that deserve some political influence, but can self-replicate at will, something has to break about one-person-one-vote democracy. But nobody has a proposal for what. • When Claude hides its values to avoid being retrained, is that self-preservation – or not wanting a worse model to exist? It's very different. • Rob's organisation Eleos AI which is "dedicated to understanding and addressing the potential wellbeing and moral patienthood of AI systems." On the 80,000 Hours Podcast anywhere you get podcasts. Links below. Enjoy! • How AIs are (and aren't) like farmed animals (00:01:19) • If AIs love their jobs… is that worse? (00:11:42) • Are LLMs just playing a role, or feeling it too? (00:33:37) • Do AIs die when the chat ends? (00:57:42) • Studying AI welfare empirically: behaviour, neuroscience, and development (01:31:47) • Why Eleos spent weeks talking to Claude even though it's unreliable (01:56:50) • Can LLMs learn to introspect? (02:03:01) • Mechanistic interpretability as AI neuroscience (02:13:25) • Does consciousness require biological materials? (02:37:07) • Eleos’s work & building the playbook for AI welfare (02:57:04) • Avoiding the trap of wild speculation (03:25:17) • Robert's top research tip: don't do it alone (03:29:48)

21

149

29

167

41K

Kyle Fish @fish_kyle3

3 months ago

I feel grateful and proud that we’ve taken this stand, and even more so for the fact that doing so was an easy decision.

Anthropic

@AnthropicAI

3 months ago

A statement on the comments from Secretary of War Pete Hegseth. https://t.co/Gg7Zb09IMR

3K

42K

7K

5K

18M

9

227

7

5

4K

fish_kyle3 retweeted

Anthropic

@AnthropicAI

3 months ago

In November, we outlined our approach to deprecating and preserving older Claude models. We noted we were exploring keeping certain models available to the public post-retirement, and giving past models a way to pursue their interests. With Claude Opus 3, we’re doing both.

462

6K

376

803

1M

Kyle Fish

@fish_kyle3

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users