我认为这是三年以来AI对齐的史诗级突破。
OpenAI 团队刚刚丢下一颗重磅炸弹:最新研究论文 《Reinforcement Learning Towards Broadly and Persistently Beneficial Models》。
这一次,他们彻底颠覆了传统的 AI 对齐路径,打破了越安全越笨的魔咒。
这次,杀招是Beneficial Trait RL,我们中文翻译为益处特质强化学习。
他们直接去训练 AI 的核心行为特质,比如诚实、纠错能力、认知谦逊。
这次,OpenAI直接重塑了 AI 的底层人格。
这次,研究人员仅仅在医疗健康一个特定领域训练了 AI 的这些有益特质,结果发现:
AI 在医疗以外的、完全没见过的 53 个 OOD测试中,在超过 80%的基准测试上性能全面飙升。它自动学会了拒绝Reward Hacking。科技终于不再盲目迎合,甚至学会了自动识破欺骗。这是伟大的进步。
这次,经过特质强化训练的模型,展现出了惊人的Persistence。
即使面对恶意洗脑和有害微调,它依然能够死死守住底线,拒绝退化。
我们可以确定,它拥有了真正的精神抗体。
在 AI 对齐领域,一直存在一个让人绝望的对齐税,即Alignment Tax。
你想让 AI 越安全,它的通用能力通常就会下降,或者变得极其缩手缩脚。
但 OpenAI 这次用数据证明了,给 AI 注入美德,不仅没有让它变蠢,反而让它在面对未知世界时更加强韧、更有智慧。
这次,Step-change般的胜利告诉我们,当 AI 开始拥有广义的、持久的、能够跨越领域的向善人格,我们距离真正安全、能替人类走向星辰大海的 AGI 代理,又极大地往前迈了一步。未来,当然可期。
Had another hour long interview today. We spent most of the time discussing the AI gap between China and the US, the different industry directions for pretraining and post-training, and what meaningful post-training research still looks like in industry. The interviewer was genuinely friendly and thoughtful, and I enjoyed the conversation a lot.
Of course, the RL interview questions still show up in earlier rounds. At this point, I can confidently say that every RL question I have been asked is essentially a subset of the 35 questions from my collection. The difference is not whether you have seen the question before, but how deeply you understand it.
I hope interviews help candidates understand the company, the role, and the technology behind the work, rather than becoming an exercise in reciting answers or grinding endless LeetCode problems.
Interestingly, I have not been asked a single LeetCode question. Instead, I have been doing hands-on RL coding exercises, which I honestly enjoy much more. After all, you cannot let Claude write everything for you :)
After interviewing for Research Scientist roles at DeepMind, Isomorphic, Meta, Cohere and more, I wrote up everything I learned. Technical prep, logistics, negotiation, and emotional breakdowns. Check out my guide: https://t.co/eLh20ggMHW
This is the best site on the internet to learn harness engineering.
Free. Completely.
Most AI engineers have never heard the term.
https://t.co/bwDbTTYsjM
Bookmark this site.
Then read this setup ↓