Excited to be speaking with at the @ProteinSociety’s 42nd webinar hosted by Protein Design for Africa @PDForAfrica. Fam, from #STEAMulaterGames to speaking alongside global protein science leaders… grateful for these opportunities 🚀🧬 please register to watch this on June 4th 🌍. <link to register: https://t.co/gdN7YiNec8>
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
You can see the patent cliff here in action
Revenues in the white bars for most major companies imply a 50-80%+ drop in the next few years
Someone is going to have to fund the next generation of biotech startups for drug discovery so these mature companies can consume and commercialize them
Should be an interesting few years imo
This is why some of these names are trying to diversify their portfolios into Chinese drugs (particularly bsmAbs or other modalities)
There aren’t enough competitive drugs coming out of the west.
Anyone can make a pretty good bispec or ADC now it seems.
China can make hundreds of candidates allowing big Pharma to have their pick of the litter.
The west should focus on what we have always been good at - innovation and new platforms
We may need to let the traditional paths to drugs get consumed by less innovative rival nations.
The innovation gap in biotech is closing quickly and could look like the AI gap (K2 vs GPT) if we don’t do something about it right now.
@ATinyGreenCell I’ve literally just held the electrodes against the cuvette with my gloved hand to get around this problem. Biorad device not this particular one.
@design_proteins@OmicsOmicsBlog The Bagel paper does this but there’s no lab validation on it yet. Was thinking of trying to clone his seqs and give them a try
Dude I’m so >fucking< psyched about these new models I have built and am testing in silico.
After >>100,000 designed proteins using OS stacks for clients and myself
I have gone my own way and built new combinations of architectures to produce something totally different.
Something like 13 different pivots (🫠) - have the logs to prove it
>50 cloud training runs to get here (I learned so much along the way)
I built:
Something like what I wished existed already
Something like a hybridoma on a chip
Something Pharma grade imo
First class features I hand built into the stack and curated from my real drug discovery experiences:
1. Antigen awareness across the model stack - multiple structure and sequence models wired together intelligently
2. Real developability awareness - no crazy WFY fractions or poly A mess
3. Real frameworks - over 10k real frameworks from real biologically derived proteins are used in training and inference
4. No hallucinated spaghetti sequences - they fold because they’re based on natural proteins
5. Canonical residue preservation across design specs and antigen selection - no missing cys no abhorrent P or W residues
6. Actual VHH and antibodies (fragment designs and intact) which look super interesting (read natural)
7. Real levers to pull and modify the inference not just temperature and steps (though we have that too 😘). More on this later.
Working on a test set to send to a CRO then publish the preprint and code to get feedback.
I think people will like this a lot.
I love where it’s going.
You can do this easily without FACS, just pick single colonies, do colony pcr with per well barcoded primers, deep sequence, then sort. Get fancy and and forward primers have row and reverse primers have column and you have an even lower cost method for screening multiplex libraries.