The deeper point: this degradation is bolted on, not trained in. You can't scalpel out cyber capability via loss without hitting the rest — it's a lifelong-learning problem — so "safety" ends up as an external classifier. Which is why a wrapper-level jailbreak undoes it on day one. You don't control the model, you filter its I/O.
Exactly the tell: they couldn't make the weights safe, so "safety" lives in an external classifier strapped on top. You can't surgically remove one capability (cyber) via loss without degrading everything else — it's a lifelong-learning problem. So you bolt a filter on the outside. And external filters get jailbroken day one, because the attack hits the wrapper, not the cognition.