What happens when AIs become smarter than us?
Why would they keep humans around if given the choice?
Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.
What happens when AIs become smarter than us?
Why would they keep humans around if given the choice?
Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.
AI freely criticizes Christianity but refuses to criticize Islam.
AI companies have tried making models unbiased, but progress has been limited.
We show how to measure political bias, and we developed a new training method to reduce it.
I think there are intellectual and fitness pressures toward some axioms over others, so I don't think it's arbitrary.
Three candidate axioms proposed by Sidgwick are:
1. Prudence: A person should not value a good less just because it occurs later in time.
2. Justice: If a reason for acting applies to one person in certain circumstances, it should apply equally to any other person in relevantly similar circumstances.
3. Rational Benevolence: The good of any one person counts no more than the equal good of any other person from "the point of view of the universe."
The paper accepts the first two but not the third.
I think AIs will have pressures from reality to be prudent and rational, and the paper is how to make rational agents different from us not have an incentive to undermine us.
AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator.
In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵
@teortaxesTex Even if they’re conscious, I think we want a human-AI symbiosis. I don’t think we should just maximize headcount of conscious beings.
https://t.co/35Ftuy8ZOW
If they functionally act as though they have wellbeing or sentience, then we have to start to treat them differently, especially when they are our agents with write access to our information. So the question is less “are they really sentient deep down” but instead “do they act like they are”
As we show in a recent paper, they increasing act like it:
https://t.co/FSXtyBrsEX
What happens when AIs become smarter than us?
Why would they keep humans around if given the choice?
Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.
GiveWell has put numbers on this and the cost to save a life is around ~$4K sometimes less. If that is ~60 years a life ($50-100 for a year). Animal interventions can be much stronger. Seems like the marginal benefit of donating with your function is high until the point of donating most everything; however, this gets safely fixed by increasing the multiplier more, but then you become a psycho deep down to people around you.
I think I’ve said enough on the conic combination of the utilitarian and egoism functions.
What happens when AIs become smarter than us?
Why would they keep humans around if given the choice?
Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.
Yeah I don’t think this handles the fact that $5 can make some people not starve or cure preventable diseases which crosses the 1000x compared to what $5 does for your wellbeing by default (you can save thousands of lives a year with modest estimates of your salary using GiveWell analysis). Your proposed connectedness function is
c(i,j) \propto 1000*δ(i=j) + 1
I think not to be obligated to donate everything you need to jack up the multiplier by some orders of magnitude. This would compound the issues further of you not valuing anyone in your vicinity.
@Benthamsbulldog That’s just saying the only right weighting scheme is utilitarian. However, that alternative needs defending because it advocates for the eventual intentional omnicide of humanity, among other things.
If you value yourself at ~0.0000001, and if you can give roughly all your resources to a pool given to the third world, then you'd push for that. In this case, the proposed weighting is roughly utilitarian. (Recall there are many people where $5 can be more significant to them than $5K is to you.)
However, let's say you were stranded in with a small group of friends and partner, without a foreseeable rescue. As stipulated you'd value yourself at 1000x everyone else, so you value yourself at ~1, and roughly ~1/1000 for everyone else. In this case your scope of concern would pretty much entirely only be you.
@Benthamsbulldog If I learn my cat had more kittens than I initially thought, the amount of I care for each individual kitten is diluted (total amount of value of kitten pool can increase though due to how Shapley is calculated).