I think spiritually true is a bad framing. Would you rather correctly update your priors but not actually get the terminal state update or signal. Or have the terminal state but no intermediary rewards or sparse or incorrect information so you know the result, but can't reproduce
@AgustinLebron3 feels like an outdated mental model? sure internet bootstrapped intelligence, but aren't most of the recent capability gains RL techniques.
@tszzl what is being 'paused' though. the self improvement loop? frontier model access? experiment types? I feel like its easy to look at this situation and say yes we should be careful, but hard to align or coordinate exact pathing.