Sören Mindermann @sorenmind - Twitter Profile

Pinned Tweet

over 5 years ago

Super excited to share that **Inferring the effectiveness of government interventions against COVID-19** was just published in Science !! https://t.co/rvw7xJMPIf Work done with amazing collaborators @JanMBrauner, @MrinankSharma ... 1/

12

189

56

35

0

sorenmind retweeted

Sam

@Discoplomacy

about 13 hours ago

The British Government has published ‘AI Scenarios 2030: Helping policymakers plan for the future of AI’. Written by Government Office for Science (GO-Science) with input from AISI and DSIT and external experts, it examines five scenarios, grouped into three technological trajectories: 1. slowed 2. continued 3. taken off

Discoplomacy's tweet photo. The British Government has published ‘AI Scenarios 2030: Helping policymakers plan for the future of AI’.

Written by Government Office for Science (GO-Science) with input from AISI and DSIT and external experts, it examines five scenarios, grouped into three technological trajectories:

1. slowed
2. continued
3. taken off

6

201

44

224

35K

sorenmind retweeted

Alex Dimakis

@AlexGDimakis

about 22 hours ago

I am very excited about this research: We show 2 things: 1. If you just do random sampling (i.e. you try to solve a problem k times independently, and keep the best) your ELO scaling will be linear in log(test-time-compute). Agents like Claude-Code and Codex scale like that after a few hours. 2. We compare human expert coders to coding agents on the same tasks (from AtCoder Heuristic Contest). The exciting finding is that humans scale super-linearly. This is evidence that humans do continual learning, while they are solving a problem! I.e. they learn more about the coding problem they are trying to solve and scale fundamentally better compared to randomly trying things in a memoryless fashion. This is empirical evidence that supports what many of us have felt for a while: unless we solve continual learning we will not be able to outperform humans in tasks that take many days. Current coding agents are not able to do this.

23

565

66

349

111K

sorenmind retweeted

Iason Gabriel @IasonGabriel

1 day ago

What happens if the question of whether future AI systems are conscious can’t be solved? What would it mean morally—and how could we continue to live well, together? New research with @adamtbales aims to answer that question...

IasonGabriel's tweet photo. What happens if the question of whether future AI systems are conscious can’t be solved?

What would it mean morally—and how could we continue to live well, together?

New research with @adamtbales aims to answer that question... https://t.co/GtI5gq54Ti

3

114

24

83

23K

Who to follow

Rachel Freedman (will be @ICML2026)

@FreedmanRach

RLHF, LLMS, interpretability & safety | PhD researcher @berkeley_ai | Previously @Cambridge_Uni and @DukeU

Rohin Shah

@rohinmshah

AGI Safety & Alignment @ Google DeepMind

andy jones

@andy_l_jones

engineering & research at anthropic. i don't check twitter DMs. email me!

sorenmind retweeted

Kobi Hackenburg

@KobiHackenburg

1 day ago

New w/ @AISecurityInst & @UniofOxford: Frontier AI can now out-persuade expert humans in conversation - incl. world-champ debaters and professional canvassers. This held even when humans chose their topics, prepared in advance, and competed for £1,000 prizes 🧵

46

749

189

453

121K

Sören Mindermann @sorenmind

7 days ago

@MariusHobbhahn Situational awareness more broadly was also predicted, e.g. by Ajeya Cotra's blog and our paper here https://t.co/5Nw2uTVBLC

0

1

0

132

sorenmind retweeted

Marius Hobbhahn

@MariusHobbhahn

8 days ago

Reward hacking was convergent across ~all models and labs Sycophancy was convergent Eval awareness was convergent All three of the above a) were predicted by theory, b) are quite sticky. So I think this is evidence that we should scheming & powerseeking to behave the same

8

173

10

48

9K

sorenmind retweeted

Cas (Stephen Casper)

@StephenLCasper

8 days ago

Anthropic and OpenAI are publicly pointing out how having the option to slow down AI would offer a potentially critical form of optionality in the future. The correct response for any policymaker should be "Damn, this is serious. How can I help build that capacity?"

StephenLCasper's tweet photo. Anthropic and OpenAI are publicly pointing out how having the option to slow down AI would offer a potentially critical form of optionality in the future. The correct response for any policymaker should be "Damn, this is serious. How can I help build that capacity?" https://t.co/3cia0V7zqr

1

108

19

16

4K

sorenmind retweeted

Geoffrey Irving

@geoffreyirving

9 days ago

New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance and tractability of applications to AI safety. I'm excited this is out! Here is a broader plea for people to be very ambitious about verifying software! 🧵

geoffreyirving's tweet photo. New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance and tractability of applications to AI safety. I'm excited this is out!

Here is a broader plea for people to be very ambitious about verifying software! 🧵 https://t.co/jZZ5N8ALbl

2

110

19

55

20K

sorenmind retweeted

Ryan Greenblatt

@RyanPGreenblatt

13 days ago

It would be nice if AI companies and others (e.g. startups) tried to have their AIs hillclimb on this task. ARC is approximately our only current bet on scalable/worst-case solutions to alignment and they could be boosted by relatively checkable work!

4

96

8

30

9K

sorenmind retweeted

Séb Krier

@sebkrier

13 days ago

Good graphs showing actual AI progress (and wether we're seeing anything like RSI). Imo the methodology here tracks the right measures (i.e. a wide range of aggregate capabilities over time) rather than extrapolating from a weak proxy (e.g. how good agents are at coding). https://t.co/F4ek0scwCV

sebkrier's tweet photo. Good graphs showing actual AI progress (and wether we're seeing anything like RSI). Imo the methodology here tracks the right measures (i.e. a wide range of aggregate capabilities over time) rather than extrapolating from a weak proxy (e.g. how good agents are at coding). https://t.co/F4ek0scwCV

7

96

17

68

9K

sorenmind retweeted

Camila Blank @camila_blank

14 days ago

Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on? We find a simple explanation: it's just steering vector distillation. We explain which traits transfer and why subliminal learning fails across models.

camila_blank's tweet photo. Subliminal learning is when LLMs transmit traits (e.g. loving cats) through seemingly meaningless data. What’s going on?

We find a simple explanation: it's just steering vector distillation.

We explain which traits transfer and why subliminal learning fails across models. https://t.co/NiwHp1BRVJ

16

385

48

267

91K

sorenmind retweeted

Neo Research @NeoResearchAI

15 days ago

We're Neo Research (新衡). Asia’s first independent frontier AI safety evaluation & research lab. Today we're publishing our first report: an independent safety evaluation of DeepSeek v4 Pro. (1/5)

20

790

88

383

108K

Sören Mindermann @sorenmind

15 days ago

@StephenLCasper @MITCSAIL @Harvard @Kennedy_School Congrats man!!

0

1

0

64

sorenmind retweeted

Markus Anderljung

@Manderljung

about 2 months ago

Two important skills in AI policy: knowing the numbers, and being calibrated about how confident to be in them. So I vibe-coded a little game to train both. Mostly AI trivia. You score better if you know stuff + know what you don't know. Have a go.

Manderljung's tweet photo. Two important skills in AI policy: knowing the numbers, and being calibrated about how confident to be in them. So I vibe-coded a little game to train both. Mostly AI trivia.

You score better if you know stuff + know what you don't know. Have a go. https://t.co/m38oyZ82XC

2

93

7

81

5K

sorenmind retweeted

benedict

@bqbrady

about 2 months ago

Introducing Philosophy Bench, my favorite new project I've worked on this year, with help from my friend @matthewjmandel We put frontier language models in 100 ethically complex situations and require them to act, grading them on adherence to consequentialism vs. deontology, tendency to follow user requests, corrigibility, and more 1/

bqbrady's tweet photo. Introducing Philosophy Bench, my favorite new project I've worked on this year, with help from my friend @matthewjmandel

We put frontier language models in 100 ethically complex situations and require them to act, grading them on adherence to consequentialism vs. deontology, tendency to follow user requests, corrigibility, and more

1/

35

676

103

534

70K

Sören Mindermann @sorenmind

2 months ago

Our new results and followup work in Owain's thread also show that some forms of subliminal learning can still happen even when the base models are different.

0

76

Sören Mindermann @sorenmind

2 months ago

Excited that Subliminal Learning just came out in Nature! Our result implies that safety auditing needs to look beyond the data. Models are increasingly distilled on each other's outputs, so they may inherit issues not visible in the data.

Owain Evans

@OwainEvans_UK

2 months ago

Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless). What’s new?🧵

OwainEvans_UK's tweet photo. Our paper on Subliminal Learning was just published in Nature!

Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless).

What’s new?🧵 https://t.co/Iiv9sgjJki

41

886

139

481

520K

1

5

0

2

643

Sören Mindermann @sorenmind

2 months ago

One interesting bit from the paper: LLMs didn't subliminally learn from models of a different base. But GPT 4.1 and 4o share the same base so the effect still happens.

sorenmind's tweet photo. One interesting bit from the paper: LLMs didn't subliminally learn from models of a different base. But GPT 4.1 and 4o share the same base so the effect still happens. https://t.co/fvFb55XTsn

1

0

64

sorenmind retweeted

Ryan Greenblatt

@RyanPGreenblatt

2 months ago

Current AIs (Opus 4.5/4.6) seem pretty misaligned to me (in a mundane behavioral sense). In my experience, they often oversell their work, downplay problems, and stop early while claiming to be done. They sometimes brazenly cheat.

RyanPGreenblatt's tweet photo. Current AIs (Opus 4.5/4.6) seem pretty misaligned to me (in a mundane behavioral sense). In my experience, they often oversell their work, downplay problems, and stop early while claiming to be done. They sometimes brazenly cheat. https://t.co/ugzAwDhHA3

19

451

37

81

71K

sorenmind retweeted

Alexander Barry

@AlexBarry4

2 months ago

I made an update to the interactive task-success-rate plot for METR time horizon. You can now see how the performance on the TH task suite has evolved over time by walking through model releases (with optional point jittering for increased visibility).

AlexBarry4's tweet photo. I made an update to the interactive task-success-rate plot for METR time horizon.

You can now see how the performance on the TH task suite has evolved over time by walking through model releases (with optional point jittering for increased visibility). https://t.co/Kz18Grjf4p

2

30

1

5

1K

Sören Mindermann

@sorenmind

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users