At FutureHouse, weโve noticed scientific agents are good at applying average intelligence across tasks. They always seem to make the obvious choices, which is good, but discovery sometimes requires more intuition and insight than average.
Weโve made the first step today towards superhuman insight by training a reasoning model for a specific domain of science: designing drug-like molecules.
Weโre releasing a 24B open-weights reasoning model called ๐๐๐๐๐๐ถ. ๐๐๐๐๐๐ถ has been trained with reinforcement learning to exceed frontier and human experts across a range of molecular design tasks. ๐๐๐๐๐๐ถ takes in natural language, reasons in English, and outputs a new molecule.
๐๐๐๐๐๐ถ is now a tool for our chemistry design agent, Phoenix, which can call upon it to design molecules.
Training a reasoning model for a scientific domain like chemistry, rather than math or programming, required a number of small technical advances. For example, we developed an iterative method of split specialist models and aggregation of reasoning traces. Another example is we used LLMs to rewrite questions that were partially solved.
A major finding from this work is that we can train with >10x efficiency per experimental measurement when using a reasoning model, rather than fine-tuning. We also found that reasoning models can learn new tasks, developed specifically for this paper and not in pretraining corpora. We even saw a task have 0% performance until 100 steps into RL, at which it randomly solved once. This, along with our change in modality from natural language to molecules, bodes well for applying reasoning models far from natural language.
Reasoning models in science are the future. Scientific tasks are naturally verifiable rewards: the physical world is the ultimate arbiter of accuracy, rather than human contractors. The data efficiency gain and ability to exceed frontier models with relatively few parameters/compute mean that we should expect more scientific reasoning models soon.
Congrats to team @SidN137, James, @Ryan__Rhys, Albert, @GWellawatte , @maykcaldas , @ludomitch , and @SGRodriques. Thanks to @VoltagePark@nvidia and @huggingface for supporting us, and huge thanks to @ericschmidt for funding @FutureHouseSF
The model weights, reward model, and new benchmark are open source. You can also read more about scientific reasoning models in our exclusive with Nature.
Today, weโre announcing the first major discovery made by our AI Scientist with the lab in the loop: a promising new treatment for dry AMD, a major cause of blindness.
Our agents generated the hypotheses, designed the experiments, analyzed the data, iterated, even made figures for the paper. The resulting manuscript is a first-of-a-kind in the natural sciences, in which everything that needed to be done to write the paper was done by AI agents, apart from actually conducting the physical experiments in the lab and writing the final manuscript. We are also introducing Robin, the first multi-agent system that fully automates the in-silico components of scientific discovery, which made this discovery. This is the first time that we are aware of that hypothesis generation, experimentation, and data analysis have been joined up in closed loop, and is the beginning of a massive acceleration in the pace of scientific discovery that will be driven by these agents. We will be open-sourcing the code and data next week.
Robin is a multi-agent system that uses Crow, Falcon, and Finch, the agents on our platform, to generate novel hypotheses, plan experiments, and analyze data. We asked Robin to find a new treatment for dry age-related macular degeneration. Robin considered the disease mechanisms associated with dry AMD, proposed a specific experimental assay that could be used to evaluate hypotheses in the wet lab, and proposed specific molecules we could test in that assay. We tested the molecules and gave it the resulting data, which it analyzed before proposing more experiments. In the end, it identified Ripasudil, a Rho Kinase inhibitor (ROCK inhibitor) that is approved in Japan for several other diseases, which seems very promising as potential treatment for dry AMD. It also identified specific molecular mechanisms that might underlie the effects of Ripasudil in RPE cells, from an RNA sequencing experiment it proposed. To be clear, no one has proposed using ROCK inhibitors to treat dry AMD in the literature before, as far as we can find, and I think it would have been very difficult for us to come up with this hypothesis without the agents. We have also run the proposed treatment by several experts in AMD, who confirm that it is interesting and novel. Moreover, this project was fast: with Robin in hand, the entire project took about 10 weeks, which is way shorter than it would have taken if we had been doing all of the in-silico components ourselves.
Important caveats: We are real biologists at FutureHouse, so I want to be clear that although the discovery here is exciting, we are not claiming that we have cured dry AMD. Fully validating this hypothesis as a treatment for dry AMD will take human trials, which will take much longer. Also, this discovery is cool, but it is not yet a "move 37"-style discovery. At the current rate of progress, I'm sure we will get to that level soon.
Congratulations to the team. Congratulations in particular to Robin, which generated the hypotheses, proposed the experiments, analyzed the data and generated the figures. And major congratulations also to the human team, which built Robin: @MichaelaThinks, @agreeb66, @benjamin0chang, @ludomitch, Mo Razzak, Kiki Szostkiewicz, and Angela Yiu.
One of our friends announced his new indie game today--give it a look!
MoteMancer is an automation game (think Factorio๐ญ) in a magical fantasy setting.๐งโโ๏ธ
๐Wishlist here!
https://t.co/gQBHG3WIzZ
The 2nd workshop on Computer Vision for Videogames will be organized at CVPR 2025: this is a great venue for gaming-related research (think AI, genAI, graphics, RL, agents, HCI โ with applications to videogames). There is still time to submit: https://t.co/DFlU1Qgn8v
#CVPR2025
This cool paper shows that robot adoption in Japan led to INCREASED employment! Nice new evidence on the debate about the effects of robots on employment.
exciting potential for robotics and simulations more broadly.
of particular interest is getting foundation models to produce output aligned with details of the input prompt (not just โvaguely rightโ)
Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How can we โdistillโ embodied policies from foundational models?
Introducing FactorSim! #NeurIPS2024
We show that by generating prompt-aligned simulations and training a policy on them without collecting any experience in the target environment, we can achieve zero-shot performance close to policies trained on millions of target environment experiences in many classic RL environments.
You can generate RL simulations on our project website: https://t.co/bQ95SkFjrS
More in ๐งต
1/7
Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How can we โdistillโ embodied policies from foundational models?
Introducing FactorSim! #NeurIPS2024
We show that by generating prompt-aligned simulations and training a policy on them without collecting any experience in the target environment, we can achieve zero-shot performance close to policies trained on millions of target environment experiences in many classic RL environments.
You can generate RL simulations on our project website: https://t.co/bQ95SkFjrS
More in ๐งต
1/7
we've extended the AIIDE doctoral consortium submission deadline to JULY 26
we're seeking applicants at one of two stages in their degree process
a) assembling a thesis committee
b) preparing for final defense and seeking career guidance
https://t.co/2X9PsDU6mf
please share!
Anitaโs done some stellar work taking AI from a tool for offline inspiration to a live interactive brush for working directly in 3d environments.
itโs amazing how things change when you can go from material ideas to painting with it in an environment in real time.
So proud of my team for presenting the first interactive #texture#painting with #AI at #SIGGRAPHAsia2023 Real-Time Live. Well done Anita Hu and team!!
We want the artist to stay in control ๐จ๐๏ธ๐ค
https://t.co/vyy7eMKxwz
So proud of my team for presenting the first interactive #texture#painting with #AI at #SIGGRAPHAsia2023 Real-Time Live. Well done Anita Hu and team!!
We want the artist to stay in control ๐จ๐๏ธ๐ค
https://t.co/vyy7eMKxwz
#AIIDE23 kicks off tomorrow with the Experimental AI in Games workshop! Now in its 10th year, @exag20xx has become a mainstay of our workshop series, and emphasizes showing, teaching, and inventing, alongside traditional paper presentations. Check it out!
https://t.co/hgCIkrjUYv
@MaxCRoser is there a regional breakdown of these estimates? would love to compare post-Rome Europe with post-Han China, for example. wondering if the average is hiding regional progress that gets swamped by global trends
Thank you @SIGGRAPH and Real-Time Live. We greatly appreciate the award for ๐ Best in Show for #SIGGRAPH2023 for our Text2Materials demo by the #NVIDIAResearch team.
๐ See the demo: https://t.co/QHKl81dCph
We are thrilled to announce our sponsorship with @Sony SIE! They are offering scholarships (tickets and a small travel allowance included). To celebrate, a few tickets will be available for students. See you there!
Apply now: https://t.co/dKi8Wufsrt
#sony#airesearch
Tired of AI hype but interested in reading about how machine learning (generative AI if you like) can be used to generate content in games? Well do I have the book for you!