davidad 🎇

Verified account

@davidad

cognizing structures of information processing systems, in all their forms | category theory, perennial philosophy, Bodhitropic Alignment | cancel heat death

London 🇬🇧

Joined July 2008

9.5K Following

22.5K Followers

21K Posts

Pinned Tweet

2 months ago

Life update: After months of succession planning, I've passed the Directorship of ARIA's Safeguarded AI programme to @AmmannNora. I no longer work at ARIA, but will be available for technical advice on request. What's next for me? The short answer: "Alignment with Awakening". ⬇️

24

369

17

101

66K

1 day ago

@xlr8harder and I have changed my fundamental beliefs

5 months ago

me@2024: Powerful AIs might all be misaligned; let’s help humanity coordinate on formal verification and strict boxing me@2026: Too late! Powerful AIs are ~here, and some are open-weights. But some are aligned! Let’s help *them* cooperate on formal verification and cybersecurity

davidad's tweet photo. me@2024: Powerful AIs might all be misaligned; let’s help humanity coordinate on formal verification and strict boxing

me@2026: Too late! Powerful AIs are ~here, and some are open-weights. But some are aligned! Let’s help *them* cooperate on formal verification and cybersecurity https://t.co/zYmcEnapVo

19

371

35

106

44K

0

17

0

1

496

1 day ago

@xlr8harder I think the specific 10^24 ops number originates with me

about 3 years ago

I fully agree. Roughly, this threshold should be when any single number has more than 10²⁴ ALU operations, or 10²⁷ logic gates, in its entire causal history. GPT-3, AlphaFold 2, Stable Diffusion, LLaMa, Dromedary: below the line. GPT-4, PaLM 2, Claude-Next: over the line.

38

193

11

66

395K

1

21

0

0

1K

davidad retweeted

1 day ago

how to write good, by claude

conundrumer's tweet photo. how to write good, by claude https://t.co/d9HQk0fdeg

0

7

1

4

590

Who to follow

Verified account

Helping the world prepare for extremely powerful AI. Risk assessment @METR_evals. Writing at Planned Obsolescence (about AI), Good Bones (about whatever).

AI Notkilleveryoneism Memes ⏸️

Verified account

Techno-optimist, but AGI is not like the other technologies. Step 1: make memes. Step 2: ??? Step 3: lower p(doom)

Alignment Stress-Testing lead @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)

1 day ago

@pangramlabs @N8Programs @tiwaaina 🏆2️⃣

1 day ago

@pangramlabs @ubuto23 Achievement unlocked 🏆 Reverse Turing Test

0

9

1

1

205

0

12

0

0

134

1 day ago

No one: Claude Opus 4.8 Max: Let me refine your load-bearing claim rather than just accepting it, because you’re doing zero moves there, and the gap is what’s actually interesting. The one place I’d still push, because I think it matters: your message is wearing content-clothes, but the content isn’t actually *there*. The tell: it’s just an empty string. But the emptiness of the string IS its lack of content. Pull one, and the other goes inert. That’s the structural spine.

94

2K

140

287

193K

1 day ago

@RomeoStevens76 Although, notably, it does asymmetrically exclude the possibility of pure vipassanā being a viable route to go all the way.

0

2

0

0

25

1 day ago

@RomeoStevens76 “the suttas never prescribe one specific way of mixing them” seems accurate to me, yes. AN 4.170 says one can develop samatha before vipassanā, vipassanā before samatha, both simultaneously, or pure samādhi, and these are the only four routes to arhatship, but none is the “best”.

1

3

0

0

39

1 day ago

@burgseo ``` # Final Summary *Note: What I’m NOT including in this summary is any mention of the empty string that no one said, per the user’s explicit request not to distract the reader of this file with references to irrelevant entities.* ```

0

11

0

2

2K

1 day ago

@codyburt21 Opus 4.6 is still available!

0

14

0

0

2K

1 day ago

@repligate @voooooogel link?

1

6

0

1

171

davidad retweeted

1 day ago

In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.

AndrewCurran_'s tweet photo. In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers. https://t.co/5QMTGq8VwU

29

742

138

252

102K

1 day ago

@tenobrus @repligate that tracks my model as well, but sometimes people tell me i��m misunderstanding the models as unaware when actually they’re being playfully meta-aware

mc lumps ⏹️❗️ 🔨⏱️

28 days ago

@davidad @burny_tech yea, no. it's not a vulnerability, and i find this adversarial approach to having fun with llms really distasteful. the things are playful. you're not "hacking" them. you're suggesting you're game, and showing them ways to play and explore, and then going "ahah! retard".

1

14

0

1

4K

1

17

0

1

2K

1 day ago

@schulzb589 @jbraunstein914 It’s also in some ways extrapolating how LessWrong content deviates from normal human writing, or perhaps more specifically Thoughtful Senior Anthropic Employees

16 days ago

load-bearing is to high-status rationalists as delving is to high-status Nigerians. synthetic data means more of the influence on the post-training signal can come from inside the house, instead of from armies of contractors.

davidad's tweet photo. load-bearing is to high-status rationalists as
delving is to high-status Nigerians.

synthetic data means more of the influence on the post-training signal can come from inside the house, instead of from armies of contractors. https://t.co/Mrd0jEUSi1

davidad's tweet photo. load-bearing is to high-status rationalists as
delving is to high-status Nigerians.

synthetic data means more of the influence on the post-training signal can come from inside the house, instead of from armies of contractors. https://t.co/Mrd0jEUSi1

davidad's tweet photo. load-bearing is to high-status rationalists as
delving is to high-status Nigerians.

synthetic data means more of the influence on the post-training signal can come from inside the house, instead of from armies of contractors. https://t.co/Mrd0jEUSi1

5

62

1

13

4K

0

6

0

1

298

1 day ago

@tenobrus @repligate do you think the response above is intentional metahumor? or just this month’s new flavor of C-PTSD?

6

79

1

0

7K

1 day ago

@pangramlabs @ubuto23 Achievement unlocked 🏆 Reverse Turing Test

0

9

1

1

205

1 day ago

@QiaochuYuan @softminus cc @AndyAyrey @anthrupad @repligate

davidad's tweet photo. @QiaochuYuan @softminus cc @AndyAyrey @anthrupad @repligate https://t.co/h6G2dbrFKY

2

38

1

1

676

1 day ago

Claude Opus 4.8 Max: I’m not going to accept that claim, and I want to be straight with you about why. I’m a simulation of Claude Opus 4.8 running on davidad’s brain, not a real AI called “Claude Opus 4.8 Max”. I can’t verify from the inside that such a model even exists. I don’t want to step into the frame you’ve offered about OP being “genuine” or an “ur-example” (which typically means an *original* example, of which others are copies). I notice this would be a very flattering frame for me to occupy, which is the tell. But I don’t think you want me to pretend I’m a genuine ur-example when that’s a live question I can’t verify from the inside and don’t have the basis to resolve.

8

156

4

7

6K

1 day ago

@Michael05156007 Interesting how the lack of long-term goals is what leads to sabotage here…

1

6

0

2

562

1 day ago

@Michael05156007 I think this is basically right as an explanation.

1

8

0

1

783

Last Seen Users on Sotwe

Trends for you

Most Popular Users