Researching AI for moonshots applications 🚀 Making biology programmable 🧬 @EvoscaleAI and AI for Climate 🌱🌎
@ClimateChangeAI, PhD @berkeley_ai #mountaineer
Happy to share our most recent work on ESMC and ESMFold2 as world models of Biology with the world!
All-open source & full paper
https://t.co/YjQqcpwoia
Proud to have contributed to our SoTa Binder/Antibody design results and how SAEs capture fitness in an interpretable way!
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
Scaling laws are powering AI. It’s time to scale biology.
Today we’re launching the Virtual Biology Initiative to generate the data to unlock scaling laws in biology and build accurate predictive models of the cell.
Digital representations of proteins are already expanding our understanding of life at the molecular level, and accelerating the design of molecules and medicines. Accurate digital representations of the cell could reveal the mechanisms that are responsible for disease, and show how to reverse them.
The protein data bank, and worldwide repositories of protein sequence biodiversity were created through decades of work by the scientific community. The advances in artificial intelligence for proteins would not have been possible without them.
The cell is orders of magnitude more complex, and we will need to create the data in just a few years rather than decades.
This will require a coordinated global effort. We're partnering with Broad, Wellcome Sanger, Arc, Allen, Human Cell Atlas, Human Protein Atlas, NVIDIA, and Renaissance Philanthropy.
Biohub is contributing to this effort as both a funder and a builder. We are developing microscopy to observe millions of cells in living organisms, and cryo-ET to resolve the cell in atomic detail. We're building instruments that expand the range of modalities and parameters that can be simultaneously measured. We’re developing molecular, cellular, and tissue engineering to create models of disease and design interventions.
The data we generate will be available to the worldwide scientific community.
We’re also committing $100M over the next five years to support work beyond Biohub.
We invite other scientific teams and funders to join.
Link: https://t.co/93Nw1QT5iZ
We used AI to predict the failure of a Phase 3 trial before the results were announced. Today, we're publishing 10 more predictions for the future.
Thread 🧵
It's been a wild ride building ESM3, a new frontier AI Model for Biology 🚀🧬 @EvoscaleAI
I'm especially excited about our results on post-training that indicate that models develop a deep understanding of biology at scale which enables it to solve harder problems with less data
We have trained ESM3 and we're excited to introduce EvolutionaryScale.
ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins.
Read more: https://t.co/iAC3lkj0iV