New paper!
We train AIs to choose stochastically between different trajectory-lengths.
We find that -- OOD -- they're much less likely to pay costs to influence when shutdown happens.
If you open a newspaper and read of international affairs, of monetary policy, of a percentage point gained or lost overnight, you are quoting Bentham. If you praise pluralism, if you wish to maximize your gains or minimize your losses, you are quoting Bentham.
@joodalooped But also there's a sense in which you can get everything in econ theory, math, philosophy, etc. just by thinking about the situation properly
@joodalooped Hmm maybe backwards induction stuff?
If you tell two rational agents that they're going to play a prisoner's dilemma against each other some large number of times, they might cooperate every time.
But if you specify which large number they'll play, they'll defect every time.
@MatthewJBar@tamaybes I think the thing that's got Zvi/80k worried is not the benchmark scores but the stories about Mythos finding exploits in Firefox, OpenBSD, Linux, etc. I don't know if GPT-5.5 could've found those.