I’m in SF for the week if anyone wants to grab a coffee and chat about research! I’ve been thinking a lot about exploration, search, and open-ended environments recently. Feel free to DM me!
A new and possibly controversial perspective:
In this video, I explain the sense in which generative AI trained by supervised learning is incapable of making novel discoveries.
https://t.co/zin5QbbT9N
The text of the speech:
AI Creativity and Discovery
Good day ladies and gentlemen. I regret that I am unable to be with you all today to engage in a back-and-forth discussion, but I am nevertheless pleased to be able to share with you, via this recording, some high-level thoughts about the current and future state of artificial intelligence, and in particular about AI’s relationship to science and mathematics, which is, as I understand it, the central focus of this meeting and of the SAIR Foundation.
I would like to start with an old joke; I am sure you have heard it before. It is the one about the researcher whose work is being evaluated, and the review comes back, and says “This work is both novel and good. Unfortunately, the parts that are good are not novel, and the parts that are novel are not good.”
My first point about AI is that this assessment applies exactly to large parts of AI as we know it today. Not all of today’s AI, but a large part of it. Pretty much all of what we mean by “Generative AI”---which includes large language models, and the images and video models, and even the new methods for learning world models. All of these AIs take large numbers of examples and produce a “model” which behaves similar to the examples, that is, which generates text like people, or images like artists or nature, and videos like we find on the internet. Don’t get me wrong, Generative AI can be extremely useful. No doubt about that. But the assessment of the joke still applies. These systems can produce output that is both novel and good, but not at the same time.
In many ways this is just absolutely not a problem. When we ask an AI for an answer from the internet, or to summarize a document, we don’t want it to be novel. We are happy if the quality of the answer, the goodness, comes from the source material—from the people who wrote the document or the articles on the internet. If the AI’s answer is novel it means it is going beyond the source material, adding something beyond it. This is what we call “hallucinations”. In most cases, we don’t like it when the AI makes something up, when it adds something novel.
One exception, of course, is when we are looking not for facts or reality, but for fiction and entertainment. We might ask for a bedtime story for a child, or an image based on existing images on the internet but which is nevertheless different and distinct from them. In these cases, it is never easy for us to know how creative the AI is actually being, as we do not know how close the AI’s story, poem, or image is to the source material. In a real practical sense we can not know this because the internet is too big, the possible sources that the AI may draw upon are too numerous.
When we ask for a fiction or novelty, the AI can give it to us because its processing is in part stochastic. Every decision can go multiple ways and will go different ways and produce a different trajectory every time. The trajectory can be random—and thus novel—or it can be based on the training data—and thus “good” because the training data is good, sourced from people or reality. Thus, the trajectory is either novel or good—based on randomness or based on data—but never both at the same time.
Really, I think it is okay if the output of Generative AI is never good and novel at the same time. For the researcher in the joke this is a devastating criticism, but for most things it is not, and for Generative AI it is not. Generative AI is meant to be a mimic. This is what supervised learning is for. Generative AI can be extremely useful, even when it just mimics, if it is faster, or cheaper, or smaller, or more customizable, or more copy-able, than the thing being mimicked. It is okay if Generative AI cannot be both novel and good at the same time. It is still a transformative technology.
But it is a limitation. And remember we are here to use AI for science and mathematics, and for these areas the assessment of the reviewer in the joke is devastating. For these areas we need true creativity and discovery. Generative AI—or Mimicking AI—will never get where us there. For these we need something more, and indeed we have something more in other parts of AI. We have many AI systems which can give us more. We have AlphaGo with its world-changing move 37, or AlphaZero with its brilliant original chess-playing style. We have GT-Sophy that drives simulated racecars better than any human. We have AlphaFold and AlphaProof and Claude-Code, which have brought true advances in science, mathematics, and programming. We have RL-Lyft which optimizes the assignment of cars to passengers in the ride-hailing business. All these systems have found things that are both novel and good. And, truth be told, some language models have been augmented in ways that make them more than Generative AI based on supervised learning.
All these systems have some additional features that make them capable of true creativity and true discovery. It is important for us to recognize what this is—and that it is not present in ordinary, garden-variety Generative AI. It is something that can not come from just supervised learning, from learning from examples. What is it? Well, it is a simple thing, a commonsense thing. It is not new. We have many names for it, but unfortunately none of them are very good names. I will call it Discovery. Basically, Discovery is just the idea of trying many things and seeing which of them work, then keeping those that worked the best. Evolution by natural selection works this way. The scientific method works this way. And just ordinary life and learning works this way. We try things and remember what works. What could be more obvious? In this behavioral case, psychology has two names for it— “instrumental learning” and “operant conditioning”—and in machine learning it is what we mean by “reinforcement learning”. We also see the idea of Discovery in planning and combinatorial search—anything that involves the idea of “generate and test”.
The essence of Discovery is to combine three steps:
1. Variation,
2. Evaluation, and
3. Selective retention.
Of course, I am not the first to say this. I am not the first to point out that this combination of steps is key to science, to evolution by natural selection, and to animal behavior. I think particularly of papers by Donald Campbell, by Daniel Dennett, and by Gary Cziko. What is new in my remarks is to directly relate the idea of Discovery to modern AI to help us see that it is not present in supervised learning or Generative AI—in particular, that Discovery is not present in backpropagation or gradient descent.
Let me say explicitly what is missing from Generative AI. As we have remarked, these systems do have a stochastic aspect, so they do generate a variety of trajectories and behavior. What is missing is the Evaluation step. The generator was pre-trained by supervised learning, leaving no way at runtime to Evaluate what it generates. And of course without Evaluation there can be no Selective retention, and thus no Discovery. The variation can bring novelty, but without evaluation there is no Discovery, and arguably, no creativity. That is, I would say that creativity requires that the new things generated be Evaluated. Without evaluation, and retention of the best, there is nothing created. The novelty flickers into existence but, if its value is unrecognized, it flickers away and is lost.
In many cases, Evaluation is done by people to make a discovery. As when we have Generative AI make many pictures for us, and then we pick the one that we like the best. The human+AI system completes the discovery.
In many other cases, the Evaluation comes from a clear objective. Some moves lead to checkmate, some steps lead to a proof, some actions result in high reward, some genotypes make more copies, some theories explain the data better.
Some prefer the Variation step to be called Blind variation, where “blind” here means that it is uninformed, a shot in the dark. It does not need to be completely uninformed; a good scientist does not select theories to test at random. But neither can it be completely informed and determined. There must be some uncertainty about where the answer lies in order for there to be a discovery. In practice, the variation is partly informed and partly blind, but it is the blind part that corresponds to the discovery.
Now let us briefly go all the way to modern deep learning, to the backpropagation algorithm. At first it might seem that backpropagation is incapable of discovery because it is deterministic and thus incapable of variation. But this is not correct. The weight updates of backprop are deterministic, but the weights are initialized to small random values. The random initialization is often downplayed, but in fact it is a necessary form of variation; it must be done properly to get good performance. In backprop this Variation is done once, at network initialization, so its effect is temporary, and later the network may lose its ability to learn. This is the weakness of deep learning that is alleviated with a new algorithm that my group presented in Nature a couple of years ago. Our “continual backpropagation” made one small change: every so often a less-used neuron would be re-initialized to small random weights. This allows the variation to continue and plasticity to be retained.
Although there is much more to be said about Creativity and Discovery, this is the key point: they are more than supervised learning, more than pattern recognition, more than prediction, and more than world modeling. Those things are important, but they alone will not bring us to discovery. Discovery requires Evaluation from a person or from an explicit goal, and only in the latter case will we attain full autonomy.
So that is my call to arms. If we want the full power of AI scientists, then we should share the goals with them so they can create, evaluate, discover, and in these ways fully participate in achieving the goals. Let’s be bold! Let’s fully automate Creativity and Discovery!
It’s really incredible to see a company fully dedicated to open-endedness at this scale. Congrats to the team, I’m looking forward to seeing what you create!
Thrilled to share that we founded Recursive to create AI that safely conducts experiments on how to improve itself in an open-ended process of endless, automated scientific discovery. As I wrote in my 2019 AI-generating algorithms paper, this will likely be the fastest path to superintelligence. Our work since has shown the power of this approach. Excited to scale up and improve upon ideas like the Darwin Gödel Machine, HyperAgents, ADAS, OMNI, ALMA, The AI Scientist, PromptBreeder, Rainbow Teaming, Automated Capability Discovery, and other work on open-ended and AI-generating algorithms. We’ve assembled a dream team of researchers and significant resources to pursue this vision. My amazing co-founders are pictured here, and we have an all-star team of founding members (we’re over 25 and growing).
Please join us if you are interested! Follow our progress @Recursive_SI
I will be attending ICLR 2026 in Brazil 🇧🇷! I will be presenting two papers:
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
- Main conference poster — Sat Apr 25, 10:30 AM – 1:00 PM (-03), Pavilion 3, P3-#1719
Learning to Continually Learn via Meta-learning Agentic Memory Designs
- RSI Workshop oral presentation — Sun Apr 26, 15:20, Room 101-D
- MemAgent Workshop oral presentation — Mon Apr 27, 13:45, Room 205
Feel free to reach out if you'd like to chat! 🧵
Wake up Samurai, RLVG IS BACK! 😎
We are happy to announce the awesome speakers of the second edition of the RLVG workshop: Alex Kearny, @MarlosCMachado, @singhblom, @smdvln, Pete Wurman and @criticalneuro!
Join us August 15 in Montreal at @RL_Conference!
My PhD thesis defense will be here https://t.co/sPmqICWTXq tomorrow (Monday) at 9am EST. All are welcome! 🙂 Talk title: "Open-ended Learning via Procedural Content Generation in Video Games: Environment Substrates, Morphogenesis, and Designer-Player Loops". Come watch me make it make sense!
@abhishekunique7 This is really cool! Do you have any thoughts on how this approach relates to Go-Explore? https://t.co/21wNqMKhwj
It seems like the motivation is similar but you have a new way of generating interesting states to start from.
The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature!!✨
Today in Nature we share a comprehensive technical summary of our work on The AI Scientist, including new scaling law results showing how it improves with more compute and more intelligent foundation models.
The AI Scientist autonomously creates its own research ideas, codes up and conducts experiments to test those ideas, creates figures to visualize the results, writes an entire scientific manuscript summarizing what it has discovered, and conducts its own “peer” review of the resulting paper. One of its papers–entirely AI generated–passed peer review at a top-tier AI conference workshop, a historic milestone marking the dawn of a new era of AI-accelerated scientific discovery. 🔬🧪✨🧬💡🔭
Paper https://t.co/Q6tfME4yst
Blog https://t.co/C43Ooy0kjP
Work done in collaboration with a great team from Sakana, Oxford, and my lab at UBC. Thanks and congratulations everyone!
@_chris_lu_@cong_ml@RobertTLange@_yutaroyamada@shengranhu@j_foerst@hardmaru
Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself.
The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve.
We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving.
We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs.
This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).
@sethkarten Thanks for answering my questions! This is a cool project, it would be great if coding agents could make it easier for everyone to work on (currently) slow complex environments
@sethkarten If you’re saying that these environments have better reward design thats great, but I think an exact optimized copy is a valuable first step. Both to compare to prior work and to demonstrate that the system can faithfully recreate environments.
@sethkarten Sorry I should have been more specific. In Figure 4 for PokeJAX, the training curves look very different (the optimized env scores 60 elo higher). I actually didnt see the gray line for Red but yes that’s also very different. It seems like the envs aren’t the same for training
@akshitwt You might be interested in this paper by @akarshkumar0101 which used foundation models to search for more interesting artificial life simulations https://t.co/2Wb5Zod5ZJ
🧵 Introducing Xray-Visual (XRV): Scaling Unified Vision Models to 26 Billion Samples.
1/ Do standard vision encoders like SigLIP or DINO generalise to out-of-distribution (OOD) data at scale? While these models dominate academic leaderboards, we observe significant performance degradation when they are confronted with complex, real-world distributions. Introducing Xray-Visual (XRV).
Checkout our work on controlling image editing via sliders in SoTA models appearing at #cvpr2026.
Bonus: our method does not require paired data for sliders! ↘️
Can large language models (LLMs) act as the imagination of a reinforcement learning (RL) agent?
We found that if you let an LLM "dream" - not by hallucinating pixels, but by writing executable Python code - it can create an open-ended curriculum that drives progress in complex, long-horizon worlds.
Introducing Dreaming in Code (DiCode). 🧵👇