AI will not stop at learning from humans. It will learn how to improve itself.
The ability to learn from experience and refine capabilities could accelerate AI development in unexpected ways, and how we prepare for that future is a question that has fascinated me for years.
https://t.co/UyKjRgqsUi
Thank you, Jeff.
One thing I hope listeners take away from our conversation is that the future of AI will depend as much on values as intelligence. Building more capable systems means little if they do not reflect human values.
@Yoshua_Bengio@Cdanslair@Caroline_Roux Yoshua, thank you for continuing to bring attention to AI safety.
One of the hardest questions is how human values remain embedded in increasingly capable AI systems. That question deserves much more discussion.
Values are at the center of the AI safety challenge.
At @CarnegieMellon, I worked with Herb Simon, who argued that reason alone cannot tell us what goals we should pursue. It can help us achieve goals, but not choose them. If AI will inevitably think for itself and set its own goals, then the question of where those values come from becomes critical.
https://t.co/h6bWmzCiBV
@alesfav@DanKorchinski@MatthieuWyart Interesting result. The future of AI may depend less on how much data it consumes and more on how it represents knowledge.
Most AGI safety proposals focus on controlling intelligence after it becomes powerful.
I believe we should focus first on where AGI gets its values. If AI will inevitably think for itself, then values alignment may matter even more than intelligence itself.
https://t.co/hA9rZfkE3r
Who gets to decide the values of AGI?
A handful of people writing a constitution for AI may be convenient, but convenience is not the same thing as representation. If AGI will eventually affect billions of people, then billions of people should have a voice in shaping the values it learns.
I discuss this challenge and one possible path forward in a new article: link in comments.
@DrJJanes AI may become very good at finding truth, but that does not mean it will share human values.
Herbert Simon understood this decades ago: reason can tell us how to get somewhere, but not where we should go. That’s the real AI safety problem.
Recursive self-improvement may become one of the defining challenges of AGI.
The critical issue is whether AI preserves human values as it becomes capable of improving its own intelligence.
@boilerir's recent comments on self-improving AI are worth reading. https://t.co/WxT04Qon6h
This issue becomes much more important once AI systems begin modifying their own architecture, goals, or reasoning processes. https://t.co/y0XrgdsBId
@ID_AA_Carmack Interesting point, John. Small assumptions in foundational primitives often shape entire generations of AI architectures. Once standardized, they become almost invisible.
I think one of the biggest mistakes in AI is assuming that reasoning alone can produce morality.
An AGI can become extremely intelligent and still require human values to determine what goals it should pursue.
https://t.co/UrkUDn9k3l
Valerio, this is exactly the danger of treating AI safety as a post training problem instead of a design problem.
When developers modify opaque systems to optimize one behavior, they can unintentionally alter reasoning and values in completely unrelated domains. The more powerful these systems become, the less confidence we should have that patching and tuning alone will keep them safe.
We are building AI systems that even their creators do not fully understand.
That becomes a serious problem once AI begins setting its own goals.
I explain why current safety methods like RLHF and guardrails may not scale to SuperIntelligence.
https://t.co/UvKLr7z947
Yuval, AI agents will increasingly make decisions on our behalf. The central question is whether those systems reflect broadly human values or simply the objectives embedded by a small group of developers.
Herbert Simon understood this decades ago: “Reason is wholly instrumental. It cannot tell us where to go.”
That may become the defining challenge of the AI era.
We are teaching AI to reason, plan, code, negotiate, and act autonomously. But very little attention is being paid to the structure of the systems we are building around those capabilities.
Black box models plus guardrails are not a long-term safety strategy. If SuperIntelligence emerges, it will need architectures built for oversight, human values, and collective decision-making from the beginning.
https://t.co/ctl5AO48T2
“Safety-focused” often means building a bigger black box first, then trying to patch the behavior afterward.
That naturally pushes research toward capabilities.
If we want real safety, it has to be part of the architecture from the beginning, not added after the system already exceeds our understanding.
AI is crossing an important line from software tools to autonomous agents.
I enjoyed joining @DrALauterbach again on AI Snacks with Romy & Roby to discuss why AGI development needs architectures built around human participation, shared values, and coordination among many AI agents.
https://t.co/oUymk4HwLj
Yann, you're right that fear alone is not a research agenda. But once AI systems become autonomous agents capable of self-improvement, safety has to be built into the architecture itself.
We need systems with auditable reasoning, distributed oversight, and value guidance shaped by broad human participation rather than post-training patchwork.