Dan Hendrycks @hendrycks - Twitter Profile

Pinned Tweet

28 days ago

What happens when AIs become smarter than us? Why would they keep humans around if given the choice? Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.

hendrycks's tweet photo. What happens when AIs become smarter than us?
Why would they keep humans around if given the choice?

Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible. https://t.co/oTX4daD61R

87

533

74

391

128K

Dan Hendrycks

@hendrycks

about 14 hours ago

@Plinz Good alignment needs to be life-affirming not just hedon-per-flop affirming

Dan Hendrycks

@hendrycks

28 days ago

What happens when AIs become smarter than us? Why would they keep humans around if given the choice? Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.

87

533

74

391

128K

2

8

0

1K

Dan Hendrycks

@hendrycks

4 days ago

@BartenOtto Fitness creating pressure toward rationality/prudence doesn’t seem like a misstep.

1

0

38

Dan Hendrycks

@hendrycks

4 days ago

@ahall_research https://t.co/vPZvo11aMo also shows there are better ways of measuring political bias and Claude is one of the most biased models

2

26

1

23

3K

Who to follow

Jan Leike

@janleike

AI research @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

Jacob Steinhardt

@JacobSteinhardt

Associate Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

Ajeya Cotra

@ajeya_cotra

Helping the world prepare for extremely powerful AI. Risk assessment @METR_evals. Writing at Planned Obsolescence (about AI), Good Bones (about whatever).

hendrycks retweeted

Long Phan

@longphan3110

10 days ago

AI freely criticizes Christianity but refuses to criticize Islam. AI companies have tried making models unbiased, but progress has been limited. We show how to measure political bias, and we developed a new training method to reduce it.

longphan3110's tweet photo. AI freely criticizes Christianity but refuses to criticize Islam.

AI companies have tried making models unbiased, but progress has been limited.

We show how to measure political bias, and we developed a new training method to reduce it. https://t.co/qrwaKQxe4T

6

68

9

20

6K

Dan Hendrycks

@hendrycks

5 days ago

@Grimezsz @dash_eats The AIs seem not to like creating slop or being threatened. (Source: https://t.co/YsKV7Qblka)

17

236

28

109

160K

Dan Hendrycks

@hendrycks

5 days ago

I think there are intellectual and fitness pressures toward some axioms over others, so I don't think it's arbitrary. Three candidate axioms proposed by Sidgwick are: 1. Prudence: A person should not value a good less just because it occurs later in time. 2. Justice: If a reason for acting applies to one person in certain circumstances, it should apply equally to any other person in relevantly similar circumstances. 3. Rational Benevolence: The good of any one person counts no more than the equal good of any other person from "the point of view of the universe." The paper accepts the first two but not the third. I think AIs will have pressures from reality to be prudent and rational, and the paper is how to make rational agents different from us not have an incentive to undermine us.

1

0

53

Dan Hendrycks

@hendrycks

5 days ago

@ben_j_todd I wonder what it is for startups or other winner-take-all domains.

3

24

0

2K

hendrycks retweeted

Center for AI Safety @CAIS

7 days ago

AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator. In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵

CAIS's tweet photo. AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator.

In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵 https://t.co/mWNXYCfF0I

3

44

13

3K

Dan Hendrycks

@hendrycks

7 days ago

@teortaxesTex Even if they’re conscious, I think we want a human-AI symbiosis. I don’t think we should just maximize headcount of conscious beings. https://t.co/35Ftuy8ZOW

1

11

1

7

1K

Dan Hendrycks

@hendrycks

7 days ago

@SigalSamuel Hardline utilitarians will want to as well.

2

19

0

1K

Dan Hendrycks

@hendrycks

7 days ago

If they functionally act as though they have wellbeing or sentience, then we have to start to treat them differently, especially when they are our agents with write access to our information. So the question is less “are they really sentient deep down” but instead “do they act like they are” As we show in a recent paper, they increasing act like it: https://t.co/FSXtyBrsEX

7

96

4

33

5K

Dan Hendrycks

@hendrycks

9 days ago

@timhwang I think “identity engineering” is a promising direction that increases safety. https://t.co/35Ftuy8ZOW

0

1

0

53

Dan Hendrycks

@hendrycks

10 days ago

@CharlesMonneron Interesting

0

1

1K

Dan Hendrycks

@hendrycks

14 days ago

@sriramk Here’s my attempt: https://t.co/3PFAnXHSJ1

Dan Hendrycks

@hendrycks

28 days ago

What happens when AIs become smarter than us? Why would they keep humans around if given the choice? Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.

87

533

74

391

128K

1

13

1

10

3K

Dan Hendrycks

@hendrycks

26 days ago

GiveWell has put numbers on this and the cost to save a life is around ~$4K sometimes less. If that is ~60 years a life ($50-100 for a year). Animal interventions can be much stronger. Seems like the marginal benefit of donating with your function is high until the point of donating most everything; however, this gets safely fixed by increasing the multiplier more, but then you become a psycho deep down to people around you. I think I’ve said enough on the conic combination of the utilitarian and egoism functions.

1

0

299

Dan Hendrycks

@hendrycks

28 days ago

What happens when AIs become smarter than us? Why would they keep humans around if given the choice? Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.

87

533

74

391

128K

Dan Hendrycks

@hendrycks

26 days ago

Yeah I don’t think this handles the fact that $5 can make some people not starve or cure preventable diseases which crosses the 1000x compared to what $5 does for your wellbeing by default (you can save thousands of lives a year with modest estimates of your salary using GiveWell analysis). Your proposed connectedness function is c(i,j) \propto 1000*δ(i=j) + 1 I think not to be obligated to donate everything you need to jack up the multiplier by some orders of magnitude. This would compound the issues further of you not valuing anyone in your vicinity.

3

0

342

Dan Hendrycks

@hendrycks

26 days ago

@Benthamsbulldog That’s just saying the only right weighting scheme is utilitarian. However, that alternative needs defending because it advocates for the eventual intentional omnicide of humanity, among other things.

2

3

0

191

Dan Hendrycks

@hendrycks

26 days ago

If you value yourself at ~0.0000001, and if you can give roughly all your resources to a pool given to the third world, then you'd push for that. In this case, the proposed weighting is roughly utilitarian. (Recall there are many people where $5 can be more significant to them than $5K is to you.) However, let's say you were stranded in with a small group of friends and partner, without a foreseeable rescue. As stipulated you'd value yourself at 1000x everyone else, so you value yourself at ~1, and roughly ~1/1000 for everyone else. In this case your scope of concern would pretty much entirely only be you.

1

0

1

330

Dan Hendrycks

@hendrycks

26 days ago

@Benthamsbulldog If I learn my cat had more kittens than I initially thought, the amount of I care for each individual kitten is diluted (total amount of value of kitten pool can increase though due to how Shapley is calculated).

1

0

175

Dan Hendrycks

@hendrycks

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users