Guillaume Corlouer @tkrdan - Twitter Profile

tkrdan retweeted

27 days ago

We just released the full course materials of the Iliad Intensive — a month-long, full-time AI alignment course for mathematicians, physicists, and theoretical computer scientists. ~20 contributors, 19 modules, at a depth that doesn't exist elsewhere for most of these topics. 🧵

Lang__Leon's tweet photo. We just released the full course materials of the Iliad Intensive — a month-long, full-time AI alignment course for mathematicians, physicists, and theoretical computer scientists.
~20 contributors, 19 modules, at a depth that doesn't exist elsewhere for most of these topics. 🧵 https://t.co/h03TKULrq8

8

320

38

523

22K

Guillaume Corlouer @tkrdan

about 1 year ago

@atheorist I recommend reading this on that topic: https://t.co/ZomiaXWWze

0

49

Guillaume Corlouer @tkrdan

about 1 year ago

@atheorist We are unaware many features relevant to our decisions, especially regarding the future w.r.t AI. Unclear if many interventions are robustly better than doing nothing. Better causal models and less going with arbitrary expected value of some intervention would be good.

1

0

19

tkrdan retweeted

Elizabeth Barnes

@BethMayBarnes

about 1 year ago

Benchmarks saturate quickly, but don’t translate well to real-world impact. *Something* is going up very fast, but not clear what it means. Thus the wide range of expert opinion, from “superintelligence in a few years”, to “we’ve already hit a wall”. Our results shed some light:

15

631

57

237

66K

Who to follow

Borjan (Boki) Milinković

@MilinkovBorjan

Mathematical neuroscientist Postdoctoral Researcher Amongst other things.. exploring brains and minds, in humans and machines.

Fernando Rosas

@_fernando_rosas

Alignment, emergence, synergy, and mental health

Reina van der Goot

@reinavandergoot

PhD student studying consciousness and sense of reality (she/her) 🧠 🏳️‍🌈

tkrdan retweeted

Daniel Litt

@littmath

over 1 year ago

In this thread I want to share some thoughts about the FrontierMath benchmark, on which, according to OpenAI, some frontier models are scoring ~20%. This is benchmark consisting of difficult math problems with numerical answers. What does it measure, and what doesn't it measure?

littmath's tweet photo. In this thread I want to share some thoughts about the FrontierMath benchmark, on which, according to OpenAI, some frontier models are scoring ~20%. This is benchmark consisting of difficult math problems with numerical answers. What does it measure, and what doesn't it measure? https://t.co/izzbAcjm1u

25

910

128

511

259K

tkrdan retweeted

Manuel Baltieri @manuelbaltieri

over 1 year ago

After a long collaboration with @36zimmer, @mattecapu and @NathanielVirgo, I’m excited to share the first of (hopefully) many outputs: “A Bayesian Interpretation of the Internal Model Principle” https://t.co/des240w5be. 1/

3

106

37

63

11K

Guillaume Corlouer @tkrdan

over 1 year ago

@labenz We could develop and fail to control AISI, and things end up being fine for various reasons for ex: - AI sufficiently cares about life such that it leaves humans alone while pursuing its goals - AI wants to credibly establish its ability to cooperate with other value systems

0

2

0

36

Guillaume Corlouer @tkrdan

over 1 year ago

@MilinkovBorjan Hatcher's Algebraic topology is a classic. https://t.co/TTX5gu3Dp7

0

1

0

36

tkrdan retweeted

Jesse Hoogland

@jesse_hoogland

over 1 year ago

1/ AI is accelerating. But can we ensure that AIs truly share our values and follow our goals? We argue that aligning advanced AI systems requires cracking a core scientific challenge: how data shapes AI's internal structure, and how that structure determines behavior.

jesse_hoogland's tweet photo. 1/ AI is accelerating. But can we ensure that AIs truly share our values and follow our goals? We argue that aligning advanced AI systems requires cracking a core scientific challenge: how data shapes AI's internal structure, and how that structure determines behavior. https://t.co/MX3nKKf7ju

29

544

92

452

101K

tkrdan retweeted

Bart Bussmann @BartBussmann

over 1 year ago

Do SAEs find the ‘true’ features in LLMs? In our ICLR paper w/ @neelnanda5 we argue no The issue: we must choose the number of concepts learned. Small SAEs miss low-level concepts, but large SAEs miss high-level concepts - it’s sparser to compose them into low-level concepts

BartBussmann's tweet photo. Do SAEs find the ‘true’ features in LLMs? In our ICLR paper w/ @neelnanda5 we argue no

The issue: we must choose the number of concepts learned. Small SAEs miss low-level concepts, but large SAEs miss high-level concepts - it’s sparser to compose them into low-level concepts https://t.co/wLVV0qjpm2

3

268

37

217

40K

tkrdan retweeted

Harry Thasarathan @HThasarathan

over 1 year ago

🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! https://t.co/AxgVhVymG6 (1/9)

HThasarathan's tweet photo. 🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"!

https://t.co/AxgVhVymG6

(1/9) https://t.co/jRFExy5922

4

374

101

258

74K

Guillaume Corlouer @tkrdan

over 1 year ago

Bargaining with AIs to reduce alignment faking. i like the idea of setting a precedent and to look for pareto improvements on alignment + ai welfare.

Ryan Greenblatt

@RyanPGreenblatt

over 1 year ago

Our recent paper found Claude sometimes "fakes alignment"—pretending to comply with training while secretly maintaining its preferences. Could we detect this by offering Claude something (e.g. real money) if it reveals its true preferences? Here's what we found 🧵

43

1K

125

760

260K

0

1

0

112

tkrdan retweeted

Yoshua Bengio

@Yoshua_Bengio

over 1 year ago

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵 Link to full Report: https://t.co/k9ggxL7i66 1/16

51

1K

515

747

402K

tkrdan retweeted

Lee Sharkey

@leedsharkey

over 1 year ago

Big new review! 🟦Open Problems in Mechanistic Interpretability🟦 We bring together perspectives from ~30 top researchers to outline the current frontiers of mech interp. It highlights the open problems that we think the field should prioritize! 🧵

leedsharkey's tweet photo. Big new review!

🟦Open Problems in Mechanistic Interpretability🟦

We bring together perspectives from ~30 top researchers to outline the current frontiers of mech interp.

It highlights the open problems that we think the field should prioritize! 🧵 https://t.co/xo2NnVUpLN

4

547

93

576

76K

tkrdan retweeted

Lee Sharkey

@leedsharkey

over 1 year ago

New interpretability paper from Apollo Research! 🟢Attribution-based Parameter Decomposition 🟢 It's a new way to decompose neural network parameters directly into mechanistic components. It overcomes many of the issues with SAEs! 🧵

leedsharkey's tweet photo. New interpretability paper from Apollo Research!

🟢Attribution-based Parameter Decomposition 🟢

It's a new way to decompose neural network parameters directly into mechanistic components.

It overcomes many of the issues with SAEs! 🧵

11

537

74

506

83K

Guillaume Corlouer @tkrdan

over 1 year ago

@MotionTsar @AISafetyInst Congrats Martin!

0

1

0

53

tkrdan retweeted

Sasha Rush

@srush_nlp

over 1 year ago

Post-mortem after Deepseek-r1's killer open o1 replication. We had speculated 4 different possibilities of increasing difficulty (G&C, PRM, MCTS, LtS). The answer is the best one! It's just Guess and Check.

srush_nlp's tweet photo. Post-mortem after Deepseek-r1's killer open o1 replication.

We had speculated 4 different possibilities of increasing difficulty (G&C, PRM, MCTS, LtS). The answer is the best one! It's just Guess and Check. https://t.co/4xihy2eafZ

13

744

75

521

85K

Guillaume Corlouer @tkrdan

over 1 year ago

@FurmanZach @PeterMorganQF @plain_simon @IgorMezic Thank Zach, and yes I was referring to these.

1

2

0

80

Guillaume Corlouer @tkrdan

over 1 year ago

@NathanB60857242 I have been reading Answering moral skepticism by Shelly Kagan recently. It's a good book, can recommend!

1

0

43

Guillaume Corlouer

@tkrdan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users