Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
[SAVE THE DATE] MLCB 2026 will be held Nov 16–17 at the New York Genome Center in NYC!
Submission deadline is July 1 AOE
• 8-page full papers: oral or poster, with optional proceedings
• 2-page work-in-progress papers: posters
Spread the word! RT!
https://t.co/BTmQMzLN4d
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Scaling laws are powering AI. It’s time to scale biology.
Today we’re launching the Virtual Biology Initiative to generate the data to unlock scaling laws in biology and build accurate predictive models of the cell.
Digital representations of proteins are already expanding our understanding of life at the molecular level, and accelerating the design of molecules and medicines. Accurate digital representations of the cell could reveal the mechanisms that are responsible for disease, and show how to reverse them.
The protein data bank, and worldwide repositories of protein sequence biodiversity were created through decades of work by the scientific community. The advances in artificial intelligence for proteins would not have been possible without them.
The cell is orders of magnitude more complex, and we will need to create the data in just a few years rather than decades.
This will require a coordinated global effort. We're partnering with Broad, Wellcome Sanger, Arc, Allen, Human Cell Atlas, Human Protein Atlas, NVIDIA, and Renaissance Philanthropy.
Biohub is contributing to this effort as both a funder and a builder. We are developing microscopy to observe millions of cells in living organisms, and cryo-ET to resolve the cell in atomic detail. We're building instruments that expand the range of modalities and parameters that can be simultaneously measured. We’re developing molecular, cellular, and tissue engineering to create models of disease and design interventions.
The data we generate will be available to the worldwide scientific community.
We’re also committing $100M over the next five years to support work beyond Biohub.
We invite other scientific teams and funders to join.
Link: https://t.co/93Nw1QT5iZ
@spandan_madan maybe because ML researchers found a pair of words that makes "we could not fix everything" sound like a deliberate mathematical stance.
New tutorial paper on the “Foundations of Schrödinger Bridges for Generative Modeling” is out on arXiv! 🧩
📖 arXiv: https://t.co/ce4feGdXZT
🔮 Project Website: https://t.co/dyNr5TRijq
With 220 pages and 24 figures, this guide builds the theoretical foundations of Schrödinger bridges from the ground up, unifying the broad field of generative modeling with a single guiding principle: construct an optimal stochastic bridge between distributions while minimizing deviation from a reference process.
The rapid progress in generative modeling has made the field increasingly difficult to navigate from a foundational perspective, which motivated me to develop a resource that builds the core concepts needed to understand and contribute to new advances.
This guide contains intuitive explanations and step-by-step proofs covering:
🧩 The dynamic Schrödinger bridge formulation, lifting optimal transport to continuous-time stochastic processes between distributions, with direct connections to diffusion models, score-based methods, and flow matching.
🧩 A comprehensive toolkit for constructing Schrödinger bridges from first principles, describing stochastic optimal control, forward–backward SDEs, Doob’s h-transform, and Markov and reciprocal projections.
🧩 Extensions to complex and real-world problem settings, including the multi-marginal, unbalanced, discrete SB problems, highlighting the flexibility of the Schrödinger bridge framework in describing complex dynamical systems.
🧩 Practical, scalable algorithms for training and inference of dynamic Schrödinger bridges across modern generative modeling tasks.
More details in the thread 👇🏻
The entire NSF research budget is ~$9B/year. This is literally funding every awarded PI at every field and every institution.
But we've decided that all of basic science is a rounding error in comparison to venture bets.
Please consider funding basic science more.
🚀MIT Flow Matching and Diffusion Lecture 2026 Released (https://t.co/bKgs2wghvY)!
We just released our new MIT 2026 course on flow matching and diffusion models! We teach the full stack of modern AI image, video, protein generators - theory and practice. We include:
📺 Videos: Step-by-step derivations.
📝 Notes: Mathematically self-contained lecture notes
💻 Coding: Hands-on exercises for every component
We fully improved last years’ iteration and added new topics: latent spaces, diffusion transformers, building language models with discrete diffusion models.
Everything is available here: https://t.co/bKgs2wghvY
A huge thanks to Tommi Jaakkola for his support in making this class possible and Ashay Athalye (MIT SOUL) for the incredible production! Was fun to do this with @RShprints!
#MachineLearning #GenerativeAI #MIT #DiffusionModels #AI
My dear young person,
Don’t succumb to mediocrity. There’s enough of it going around. Aspire for craftsmanship, as that is what leads to joy and beauty.
The world needs more people who’re proud of what they make, and less of those who couldn’t care less.
📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages.
Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - https://t.co/DcCG3zlN8p