Plenty to be excited about with the new ESM models: blazing fast, SOTA, great wet-lab hit rates, beautiful scaling laws, open source & science. Real acceleration in the field. There's one more thing that particularly excites me, and is easy to miss in such a packed release.
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
Together with UC Berkeley we are announcing the laser phase plate - a breakthrough in atomic resolution imaging. This is the brightest continuous wave laser in the world, 100 million times the intensity of the surface of the sun.
Phase contrast plays an important role in microscopy, but it was thought close to impossible for electron microscopy, where it would require interfering with an electron beam. Holger Mueller and Robert Glaeser proposed exactly this using a standing wave laser. It has taken over 15 years to make this a reality. Biohub partnered with UC Berkeley and Mueller to support this work and to engineer and build the technology.
Contrast has been the critical barrier to achieving atomic resolution imaging of the cell. In cryo-electron tomography, a cellular imaging technology that uses electron microscopy, the low contrast makes it impossible to resolve anything but the largest proteins within their cellular context. The laser phase plate removes that barrier.
With advances in AI this breakthrough in contrast will start to open up a new frontier in structural biology, that will allow us to see the molecular machines of the cell, and how they assemble into far more complex and dynamic systems, and understand how they work.
new @NoPriorsPod with Priscilla Chan, Mark Zuckerburg and Alex Rives:
- taking seriously the @biohub mission to cure and prevent all disease (soon!)
- Model release of ESMFold2 and ESM Atlas (beating AlphaFold)
- new biological knowledge from the models
- ecosystem strategy
Brilliant idea! Next up: Apple randomly reboots your Mac if you're building competing tech, Gmail silently edits your email if you mention rival platforms, and Tesla Autopilot swerves if it detects you're working on self-driving cars.
All in the name of safety, of course. Because malicious actors controlling the world’s operating systems, inboxes and cars would be extremely dangerous!
in our 2026 @newlimit progress update, we announced our first candidate medicine.
it has the one of the most striking effects i’ve ever seen.
a single treatment accelerates recovery from alcohol in old animals. it’s so dramatic you can see it with your bare eyes!
🧵 around the interpretability work that helps connect ESMC embeddings to natural language - protein function at the micro level is around residue level mutations but at the macro level is around how they behave in the real world.
📣 new preprint multimodal atlas. Imaging + scRNA, 57M cells. 🧬🔬
Cells are complex dynamical systems — but most ways we measure them destroy them. We asked: how does live imaging compare to scRNA-seq, the field’s gold std?
The answer surprised us 🧵
https://t.co/RG0PZ1KTHW
Here's an ESMFold2 demo run by AI agents on Modal for designing potential GLP-1 receptor binders.
This test focused on the GLP-1R extracellular domain where semaglutide/Ozempic binds. It measured how well each design recovered the residues that semaglutide contacts.
Codex made this neat presentation of the demo with autonomous use of ChimeraX and ElevenLabs.
@nappenstance My notes were wrong. The correct paper name is Which transformer architecture fits my data? A vocabulary bottleneck in self-attention [[Noam Wies]] #2021
Proteins are the machinery of life. Scientists have cataloged billions of protein sequences—but their biology is still mostly unknown.
ESM Atlas is a new way in.
6.8 billion proteins. 1.1 billion predicted structures—the largest application of AI to protein biology to date. ESM Atlas makes the uncharacterized parts of protein space searchable for the first time. And it's fully open.
Start exploring: https://t.co/n6OWfcWdHe
on day 1 @newlimit, we imagined it would take 10+ years to invent real medicines.
our recent results have accelerated the timeline to next year. we've raised a Series C led by @foundersfund alongside @ThriveCapital, @Greenoaks, and many others to bring therapies to the clinic.
medicines for aging are among the most valuable possible technologies. we are grateful to our partners for the opportunity to pursue this mission.
I’m so excited about the launch of ESMFold2, ESMC, and the new ESM Atlas. This was a massive team effort, and I’m grateful to have worked with such an incredible group @biohub.
A headline result I’m especially excited about: ESMFold2 can design minibinders and antibodies with nanomolar affinity, target selectivity, and functional activity against therapeutically relevant targets.
Today, we’re sharing the full binder design protocol.
spent the morning poking @biohub's ESMC SAE atlas, really slick UI! ran ADAR1/2/3 and the deaminase feature fires identically on all three. surprised that deaminase feature even exists! super cool
Took me a while to figure out what all the ESMFold2 rage was about. At first, the benchmarking data didn't look super remarkable to me but it turns there are many impressive aspects:
- Fully open source, open weights + massive ESM Atlas (1.1B structures vs 0.2B for AF3).
- SOTA performance despite no MSA use. MSA search and triangular attention were simply taken out of the base model.
- Direct consequence, super low latency inference: 1024-residue protein structure prediction in 9 secs, still outperforming prior models on antibody-antigen tasks.
- Best in class PPI and antibody-antigen results. 65% pass rate on antibody-antigen benchmarks after inference-time scaling, significant improvement over AF3.
- Tons of experimental data, in particular with lab-validated miniprotein binders plus single-chain antibodies across 5 targets in cancer and immunology. Binding affinities consistent with therapeutic activity.
- Inference-time scaling benefits PPI: Multiple seeds + selection by confidence show real gains on challenging antibody-antigen predictions, leading to comments/hypotheses that it has learned an energy-function-like behavior via the folding module.
- Base model works without MSAs, but providing them further boosts prediction quality on difficult protein-protein interaction cases.
One caveat: No true scoring for protein-protein interactions, making it harder to assess which specific residues or domains are reliably involved in binding.
Today @biohub released the next generation of open models in the ESM family.
These models achieve SOTA results in understanding how proteins interact with other molecules, essential for designing new drugs and antibodies.
Try ESMFold2 on Modal: https://t.co/sbw830TNHw
🎙️@alexrives on 'AI for Science' with @latentspacepod breaking down our world model of protein biology: ESMFold2, ESMC, and ESM Atlas. https://t.co/0UwQAeaHUs
Happy to share our most recent work on ESMC and ESMFold2 as world models of Biology with the world!
All-open source & full paper
https://t.co/YjQqcpwoia
Proud to have contributed to our SoTa Binder/Antibody design results and how SAEs capture fitness in an interpretable way!
A newly released AI tool has generated an atlas of more than one billion predicted protein structures and billions more protein sequences.
https://t.co/nThx75YHL2