Proteins are the machinery of life. Scientists have cataloged billions of protein sequences—but their biology is still mostly unknown.
Today we're releasing a world model of protein biology: a scientific engine for prediction, design, and discovery that consists of ESMFold2, ESMC, and ESM Atlas. Together, they're helping to open up a new way for researchers to design proteins and speed up scientific discovery.
Our mission is to cure or prevent disease. To do that, we need to accelerate science. That's why we're releasing all three openly. https://t.co/TQuxOKx0e4
Our paper on ESMC, ESMFold2, and mechanistic interpretability for proteins is up on @biorxivpreprint!
We've made a few changes since the initial version went online last week.
1. We found an issue in the way we provided MSAs to OpenFold3. This led us to report lower performance of OpenFold3 on some benchmarks. This issue does not affect any of the other models evaluated.
2. We updated how we report results on Runs N' Poses to more closely match the original paper (counting only ligands with valid SuCOS similarity score). We also add a bar plot to the supplement that stratifies performance by similarity. This mostly changes the absolute values of the pass rate, not the relative performance of models.
3. Added some more BLI data to the supplement.
4. Added some missing citations, fixed typos, etc.
Check out the preprint here: https://t.co/wGoYhDz3gU
Excellent and timely article, we got into a lot of this at the recent @biohub symposium.
What excites me about new AI capabilities paired with scalable/lower cost measurement modalities is how much they lower the barrier to capturing context, from tracking more dimensions of biology to boring-but-critical details like provenance and metadata. What will we uncover when context capture and sense-making is cheap?
ESMC didn't learn protein biology from a textbook. It learned from 2.8 billion sequences—the full evolutionary record of what works in nature. That's what a world model of protein biology looks like.
Download the model and start building: https://t.co/FQ9JObZv6F
Better vitrification = better cryo-ET.
We've opened an RFA for two-year grants to advance vitrification techniques for biological samples—welcoming applications from researchers in heat transfer, cryogenics, and materials science.
Apply here: https://t.co/mwxtGgGlFD
📣 new preprint multimodal atlas. Imaging + scRNA, 57M cells. 🧬🔬
Cells are complex dynamical systems — but most ways we measure them destroy them. We asked: how does live imaging compare to scRNA-seq, the field’s gold std?
The answer surprised us 🧵
https://t.co/RG0PZ1KTHW
Designing a protein binder used to mean years of lab experiments. ESMFold2 lets researchers run hundreds of thousands of designs computationally—then take only the most promising into the lab. We tested it across 5 targets in oncology and immunology. It worked.
Download the model and start building: https://t.co/odrOR3U1hj
Proteins are the machinery of life. Scientists have cataloged billions of protein sequences—but their biology is still mostly unknown.
ESM Atlas is a new way in.
6.8 billion proteins. 1.1 billion predicted structures—the largest application of AI to protein biology to date. ESM Atlas makes the uncharacterized parts of protein space searchable for the first time. And it's fully open.
Start exploring: https://t.co/n6OWfcWdHe
I love to see @modal being used for biology and cutting-edge research like this. Very cool work from the team at @biohub to push open models forward in protein design and comp bio.
Here's how to run it on Modal: https://t.co/joiLmRw4pa
Why is this important? I'm personally really excited for the effect of this on personalized medicine. Sure, there's additional problems to solve in the space, but being able to generate in-silico binders that work in the lab is a first step to generate in-silico drugs that work in a person.
By releasing the protocol, we're inviting scientists everywhere to try to use and adapt this to their own problems.
Run it here: https://t.co/E73A7s8v19
We're actively working on making this easier to use and more efficient. Let us know what you create with this!
We have fully open sourced our binder design protocol, which generates nanomolar affinity scFvs.
The code here implements a faithful reproduction of the pipeline described in the paper, which is exactly what was used to produce our designs.
Check it out here: https://t.co/CDH6SPo7d3
I’m so excited about the launch of ESMFold2, ESMC, and the new ESM Atlas. This was a massive team effort, and I’m grateful to have worked with such an incredible group @biohub.
A headline result I’m especially excited about: ESMFold2 can design minibinders and antibodies with nanomolar affinity, target selectivity, and functional activity against therapeutically relevant targets.
Today, we’re sharing the full binder design protocol.
ESMC understands something interesting about the chemistry. We looked at SAE features that activate strongly on the RRGAIL motif. We got ones that typically activate on PH domains, in portions associated with kinase active site recognition.
Conversation: https://t.co/wBYG3IL6Bn
One feature of the @biohub ESM C release that I think deserves more attention is the interpretability of its latent space.
There has been a lot of discussion about whether interpretability is useful for scientific ML models. I think it can become very useful, especially when AI agents can use a model’s internal representations to reason about biology.
Here is one example of an AI agent with access to ESM C SAE features correctly interprets the loss-of-function mechanism behind a variant.
There is still a lot to improve in how AI agents use model interpretability, but this is an exciting direction for AI agents that don’t just make predictions, but inspect learned representations to generate mechanistic hypotheses.
Read more in our blog: https://t.co/QmJlCzJVe4
We've also released the SAE-enabled skills for variant interpretation, loss-of-function analysis, structural annotation, functional mechanism interpretation, and evaluation against experimental datasets via ToolUniverse @ScientistTools
Thanks to the team behind this! @GaoShanghua@_yepeng@marinkazitnik@countablyfinite@HarvardDBMI@harvardmed@Harvard@KempnerInst
"genes are code" is always vague
I like:
cell nucleus → storage device / storage controller
ribosome → JIT-compiler and runtime
features from a world model (use a SAE) → functions
Proteins → processes
signaling pathways → workflows
Phenotypes → behaviors / outputs
@biohub
Biohub Releases Protein Biology World Model to Address Disease
The latest update to the ESM protein language model series supports binder design and protein function mapping for therapeutic discovery
@alexrives@biohub#AI#proteins#cellsignaling
https://t.co/mOJb3Sp9Nq
Congrats @alexrives, @proteinrosh, and @biohub on this release and on making the models fully open for broad scientific use
ESMFold2 is available on multiple platforms, including @phylo_bio!
Good to see @biohub openly releasing ESMFold2, ESMC, and ESM Atlas. This open science move should help accelerate protein design and biomedical discovery.
Very impressive work by @alexrives and the CZI team on building a world model of protein biology. I’m especially thrilled to see the models and data are fully open-sourced. These contributions pave the way towards a better understanding of human physiology, and plenty of new health-care discoveries. Exciting times!