Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
Today in @NatureBiotech we report a new suit of PE8 prime editor proteins. PE8 variants were developed from laboratory-evolved PE6 proteins using AI-guided protein redesign. This approach combines recent advances in computational protein design and directed evolution to increase prime editing efficiency, especially in transient therapeutically relevant delivery settings such as mRNA+pegRNA electroporation into primary cells, eVLP delivery of prime editing RNPs, and LNP-mediated mRNA+pegRNA delivery in mice.
https://t.co/bz6PalFvc4
1/11
1/8 🚨 New preprint from the @SternbergLab & Jinek labs! CRISPR-associated transposases (CASTs) insert large DNA cargoes at precise genomic locations — no double-strand breaks needed.
Interested in genetically encodable inhibitors of your favorite biomolecular condensate? Excited to announce our latest work, w/ @jibin_sadasivan, @GeneWeiLiLab, & @LindsayCase19, on protein fragments as generalizable regulators of phase separation. (1/n)
https://t.co/6IMkrTP3ZQ
1/9 New preprint from the Sternberg Lab in collaboration with the Nishimasu Lab! We uncover how the DRT3 antiphage immune system pairs two reverse transcriptases, one RNA-templated and one protein-templated, to build a double-stranded DNA effector. https://t.co/KvouqrCWDW
Pembunuh mimpi nomor 1 itu bisa jadi kemiskinan yang struktural.
- bisa jadi, ada anak super genius seperti Einstein di Indonesia Timur yang gabisa lanjut sekolah karena jaraknya yang amat jauh.
- bisa jadi, ada anak yang berbakat musik seperti Bach di Depok tapi karena harus ngehidupin adiknya dan ditinggal ortunya, akhirnya jadi pemulung.
- bisa jadi, ada anak di Sumatera yang sangat suka bola dan jika diterusin bisa kayak Messi, tapi kena gizi buruk dan sakit-sakitan, ga mampu berobat.
Maka dari itu, fasilitas publik seperti kesehatan, pendidikan, transportasi, dsb tuh penting agar kita ga kehilangan bakat2 terpendam begini.
Why isn't there a 3Blue1Brown for biology?
For context, 3Blue1Brown is a YouTube channel, founded by Grant Sanderson, that publishes videos about math. Sanderson built an "animation engine," called manim, to help create these videos; it's a Python library that uses code to render smooth animations.
So why is nobody making highly visual, explanatory videos for biology in the same way that 3Blue1Brown is for mathematics, where each video explains a concept using a consistent visual aesthetic? I think there are at least three plausible explanations:
1. Biology demands a larger visual palette than math. Whereas many different ideas in math can be explained using a small number of symbols (charts, equations, shapes), maybe biology just requires a larger array of symbols. Showing a kinesin protein walk on a microtubule demands a completely different set of elements compared to, say, the evolution of a species. Perhaps this makes it harder to create visuals for biology.
I'm not sure this holds up to scrutiny. Math is arguably as broad as biology. 3Blue1Brown has made videos on everything from Bayes' theorem to Hilbert's curve and how Bitcoin works, and all of them have the same visual aesthetic.
2. Biology doesn't have a rich history of visual ideas, so maybe it's harder to align on an aesthetic. Graphs and geometric shapes are many centuries old, and mathematicians consciously draw on these historical norms and conventions. A line chart looks like a line chart regardless of how it's styled. Biology, though, has no such "fixed" visual language, so it takes more effort to create each new visual.
Maybe there's merit to this idea? Everyone draws a chromosome differently, for example; some people might show all 23 pairs at once, or zoom into a single locus, or abstract the entire chromosome down to a few letters. Biology operates across so many orders of magnitude that choosing the scale at which to convey an idea is itself part of the creative act, and there's no inherited convention telling anybody which scale to pick.
3. Maybe it takes too long to build visuals in biology, or the technical bar is too high? If you want to show how an enzyme works at the molecular level, for example, you'd need to understand PyMOL, Blender, etc. Iteration speeds are low, and the skill set needed to build one type of visual — like how molecules bind — won't necessarily apply to higher-order ideas, like evolution.
This bottleneck is collapsing with AI tools, though. Claude now works directly in Blender and Adobe products, for example, so iterations will be much faster. Maybe we'll see a 3Blue1Brown-esque creator emerge for biology? I'm not sure.
I'm hoping to write about these ideas, so if you have feedback (or reject my claims entirely) please let me know! I'd be keen to hear from you.
> Painting by David Goodsell, whose visual aesthetic has been extremely transformative in terms of how people think about molecular biology.
We propose a 3D maize canopy model using swarm intelligence to maximize light interception, featuring t-distribution initialization, phytomer agent optimization, collision detection, and validation across densities and varieties.
Details: https://t.co/3e8GSxNKIj