We’ve been making phylogenetic trees differentiable :)
Check out our work at #ICML2023 workshops - Sampling and Optimization in Discrete Space (SODS) ፨ and Differentiable Almost Everything (DiffAE) 〆
Looking forward to discuss and learn more! 🙌
❤️ work with Avi & @sokrypton
Most docking and cofolding methods assume the protein pocket is roughly fixed: place the ligand into a shape that's already there. That assumption breaks on a lot of real targets, and EV-A71 2A protease is a clear example. When a ligand binds, a loop next to the site moves about 4 Å. Every one of the 802 structures in OpenBind's benchmark needs that rearrangement, which is why classical docking into the unbound structure has only 5% success rate.
Turns out, the real problem isn't "where does the ligand needs to go" it's "what shape does the protein become when this specific ligand shows up." Ligand and protein are coupled, and you have to solve them together.
Pearl predicts that motion from sequence and the ligand alone. On one compound that no other zero-shot method in the benchmark solves, it placed the ligand within 0.28 Å of the crystal structure and got the loop rearrangement right. Modeling induced fit instead of assuming a rigid pocket is a big part of why this holds up on actual programs.
Celebrating the milestone of a massive 150+ million downloads of Gemma 4 with the release of the new Gemma 4 12B model! It's incredibly powerful for such a small model and it’s tiny enough to run locally on a laptop with just 16GB VRAM. Apache 2.0 license - happy building!
1/5 New paper 🎉Strong Stochastic Flow Maps!
We learn the stochastic integral, not just the transition kernel of an SDE
Huge shout out to my fellow first author Sam McCallum and the rest of the amazing team: Timothy Herschell, @Niklas_TR, @AlexanderTong7, and @JamesFosterBath
ESMFold2 uses a recurrent architecture, where representations from later states are looped into the earlier states. We apply a constraint to the recurrent update to prevent activations from growing unbounded, and backpropagate through multiple loops.
Some architectural details:
- Triangle Attention is completely unnecessary
- Looped transformers work really well
- The diffusion module needs 20% the number of steps
- We can improve on cuequivariance kernels for even faster folding.
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
Today we're announcing our Series C funding: $355M at a $4.65B valuation, led by some great investors @generalcatalyst and @Redpoint.
We've had insane growth in the last year, but we're still very early. So proud of the team and what we have built so far!
🚨 New paper: Introducing MIND (Monge Inception Distance)
Everyone agrees that FID is broken, requires too many samples, slowing down evals.
MIND requires 10x fewer samples, is more robust, faster to compute.
Our new drop-in replacement for evaluating generative models. 🧵👇
Jensen Huang, Founder and CEO of @nvidia, will serve as Carnegie Mellon’s 2026 Commencement keynote speaker and will receive an Honorary Doctor of Science and Technology.
Yours truly is a proper scientist now!
TL;DR: we used AI to redesign parts of essential cell machinery with only 19 canonical amino acids instead of 20.
Why? Great thread by @harriswangnyc provides more context and details. Let me talk a bit about the AI design part of this. 1/
Excited to report the first de novo enzyme catalyzing two of the most energetically demanding reactions in biology—phosphomonoester and phosphodiester hydrolysis—with catalytic efficiencies comparable to natural enzymes! 🚀
desB was designed zero-shot with dEVA. No structure prediction, no pre-defined motif, no reaction-intermediates. 🧵
@StanfordBiosci@bioe_stanford@SLAClab@EPFL@hes_so@simonduerr
The best two academic papers will be awarded one DGX Spark each -- We thank NVIDIA for their generous support!
Paper Submission Instructions: https://t.co/nVeip9NbrQ