Adam Khoja @AdamK133 - Twitter Profile

3 days ago

The full paper includes an analysis of the offense-defense balance of subversion, and maps the means and motives for AI betrayal between states, within states, and within AI corporations. It’s available here: https://t.co/hUzgf42KyD

0

6

0

46

Adam Khoja

@AdamK133

3 days ago

When labs trigger an intelligence explosion, they should worry about AI backdoors activating to sabotage their compute or their attempt. In a new paper, we study AI betrayal—how adversaries can make AIs work against their developers. 🧵

AdamK133's tweet photo. When labs trigger an intelligence explosion, they should worry about AI backdoors activating to sabotage their compute or their attempt.

In a new paper, we study AI betrayal—how adversaries can make AIs work against their developers. 🧵 https://t.co/17PS5XqBuP

1

18

9

4

952

Adam Khoja

@AdamK133

3 days ago

AI developers that fear AI betrayal would hesitate to deploy AIs in fully autonomous, high-stakes contexts like in the military. They would be more inclined to implement safeguards, monitoring, and transparency. We call this effect "deterrence by betrayal."

AdamK133's tweet photo. AI developers that fear AI betrayal would hesitate to deploy AIs in fully autonomous, high-stakes contexts like in the military. They would be more inclined to implement safeguards, monitoring, and transparency.

We call this effect "deterrence by betrayal." https://t.co/1sjxduv944

1

5

0

52

Adam Khoja

@AdamK133

4 days ago

@DaveRBanerjee I agree there's an element of grief https://t.co/SVm9YYVQHl

Adam Khoja

@AdamK133

over 1 year ago

Whereas I feel great sympathy for mathematicians, whose timeless aesthetic project will no longer need them, I can't help but feel anger at economists, who in their nominal pragmatism are utterly failing their mandate to anticipate economic events and contribute to policy in time

1

10

0

1K

0

1

0

1K

Who to follow

Ken Kmak @ken_kmak8542

@ken_kmak8542

Hard working man who loves God, his family and country. IFBAP 😊

4 days ago

I'm poorly calibrated on shortform view counts, but I'd guess 250M is a reasonable median for the total views @plzdontkillus will receive in July, which might make it one of the most promising public engagement projects in AI Safety this year. https://t.co/KEdFolL5MM

1

0

1

302

Adam Khoja

@AdamK133

6 days ago

@hamandcheese @credenzaclear2 Category theory will answer some surprisingly practical questions when we get good enough at it.

0

1

0

43

Adam Khoja

@AdamK133

6 days ago

.@JacobSteinhardt's GPT-2030 has aged extremely well. AI math, fast mode, product-scale online learning. The essay had a large impact on my thinking in 2023, for the better. Maybe a tad conservative, but far more aggressive than discourse at the time. https://t.co/dsFuDqtdK5

0

4

0

1

101

AdamK133 retweeted

Center for AI Safety @CAIS

7 days ago

AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator. In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵

CAIS's tweet photo. AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator.

In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵 https://t.co/mWNXYCfF0I

3

46

14

3K

Adam Khoja

@AdamK133

9 days ago

@hamandcheese And how wonderfully fine and ~fungible the tokens are, all the better to Coase us with.

0

2

0

23

Adam Khoja

@AdamK133

13 days ago

@willdepue Market inspired by this https://t.co/3CHTsX92Gy

0

1

0

24

Adam Khoja

@AdamK133

15 days ago

Any sufficiently advanced hard magic system is indistinguishable from physics.

0

1

0

58

Adam Khoja

@AdamK133

15 days ago

@YafahEdelman Consider buying this one up (if you’re on Manifold) https://t.co/ctPPjP7A79

0

2

0

446

Adam Khoja

@AdamK133

17 days ago

Beginning to notice a trend where commons-burning/successionist companies use vice signaling as a brand.

p(doom)

@prob_doom

18 days ago

We’re p(doom), an AGI research lab. We’ll pay you $300/month to record your screen while working. If your work is open-source and involves research, engineering, design, editing, or similar long-horizon digital work, fill out the form: https://t.co/NekbmBW6F5

17

199

10

214

554K

0

3

0

449

Adam Khoja

@AdamK133

17 days ago

@zetalyrae I somewhat disagree. Superhuman personal assistants will be viscerally impactful, but only because norms will shift to give them great affordances over our lives. Our time, finances, recommendations, etc. will be managed under one umbrella. Only with ~no affordances do I agree.

0

2

0

386