Dewi Gould @dswg97 - Twitter Profile

Dewi Gould @dswg97

6 days ago

@redwood_ai @ConstellOrg @MATSprogram @AetherAIS LW post: https://t.co/v9YnZFQZsr

0

6

0

2

307

Dewi Gould @dswg97

6 days ago

New paper! Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models @METR_Evals showed that models' time horizons have doubled every few months. We ask: what length of tasks can models complete without any CoT?

dswg97's tweet photo. New paper!

Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

@METR_Evals showed that models' time horizons have doubled every few months. We ask: what length of tasks can models complete without any CoT? https://t.co/RoIP3VHu0Z

5

137

30

65

44K

Dewi Gould @dswg97

6 days ago

Thanks to the following orgs for supporting this word: @redwood_ai @ConstellOrg @MATSprogram @AetherAIS

1

8

0

1

371

dswg97 retweeted

Arcadia Impact @ArcadiaImpact

11 days ago

*NEW* AI alignment research team! We're announcing the new alignment team @ArcadiaImpact. A London-based team, working closely with @AISecurityInst to tackle 3 ambitious agendas in AI alignment! 👇 🧵

1

103

11

47

8K

dswg97 retweeted

🚀Henry is leading AI Safety Research Programs

@sleight_henry

2 months ago

🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 https://t.co/pxtOduDBFh

sleight_henry's tweet photo. 🚀 Applications are now open: Constellation's Astra Fellowship 🚀

Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg!

📅 Apply by May 3rd (begins Sep 2026)
🔗 https://t.co/pxtOduDBFh

22

1K

167

2K

233K

dswg97 retweeted

Harry Mayne

@HarryMayne5

4 months ago

New paper. A Positive Case for Faithfulness. When asked to explain their decisions, LLMs can give highly plausible self-explanations. But are these explanations actually faithful, or are they just post-hoc rationalizations? We measure faithfulness via simulatability.

HarryMayne5's tweet photo. New paper. A Positive Case for Faithfulness.

When asked to explain their decisions, LLMs can give highly plausible self-explanations. But are these explanations actually faithful, or are they just post-hoc rationalizations?

We measure faithfulness via simulatability. https://t.co/993mC1K5WP

2

59

12

29

6K

Dewi Gould @dswg97

4 months ago

Big thank you to amazing collaborators @HarryMayne5 @Justinkangs and SPAR mentor @noahysiegel !

0

2

0

27

Dewi Gould @dswg97

4 months ago

New paper: A Positive Case for Faithfulness. When asked to explain their decisions, LLMs can give highly plausible self-explanations. But are these explanations actually faithful, or are they just post-hoc rationalizations? We measure faithfulness via simulatability.

dswg97's tweet photo. New paper: A Positive Case for Faithfulness.

When asked to explain their decisions, LLMs can give highly plausible self-explanations. But are these explanations actually faithful, or are they just post-hoc rationalizations?

We measure faithfulness via simulatability. https://t.co/4dYXVHFL0d

1

2

0

33

Dewi Gould @dswg97

4 months ago

Paper: https://t.co/Dobu4iUp3x LessWrong: https://t.co/ohsYvBaAcM

1

2

0

23

Dewi Gould

@dswg97

Last Seen Users on Sotwe

Trends for you

Most Popular Users