Seems like a big part succeeding as a really effective, popular, visible mayor of one of the largest cities in the world is:
1) living and growing up in the city and believing it can be improved
2) not being an asshole
Today, I signed an Executive Order temporarily repealing bedtimes in the City of New York so that kids of all ages can watch our team in the NBA Finals.
As Mayor, you’re forced to make many difficult decisions. This was not one of them.
Go Knicks.
I’m so excited about the launch of ESMFold2, ESMC, and the new ESM Atlas. This was a massive team effort, and I’m grateful to have worked with such an incredible group @biohub.
A headline result I’m especially excited about: ESMFold2 can design minibinders and antibodies with nanomolar affinity, target selectivity, and functional activity against therapeutically relevant targets.
Today, we’re sharing the full binder design protocol.
"He's one of the most cerebral founders I've met - thoughtful and philosophical with an opinionated point of view on where the physical sciences are going."
Astute insight, rooting for my old colleagues from the sidelines - nice work!
1/ I'm really excited to share that Plural is leading @Orbital_Ind's $50m Series B round
Orbital is using AI to discover and design novel compounds from the atoms up, then engineering the products that exploit them directly.
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
I find the composer 2.5 story to be exceedingly inspiring. So many people wrote cursor off as a wrapper claiming they just forked an open source IDE and rode sonnet 3.5 coat tails.
but to see them capture mind share, get enough useful data, and build the systems to train their own (very competitive) model is stellar.
I'm not sure where it goes or how they keep up, but it's always cool to see the haters proven wrong.
Today I was supposed to be on my way to Türkiye for my wedding, to meet up with my family and have them finally meet my partner and husband. We had everything planned. We chose Turkiye since it's close to Iran and my partner and I could both go there and have our families meet each other. We were supposed to get married with our close family and a small group of friends on a boat on the Mediterranean Sea at sunset. Because of the war, all flights to and from Iran are cancelled and my family can’t leave Iran, so we had to call off the wedding.
Instead, this is how my day looked like.
I woke up to a reminder to call my grandma (I used to call her every Friday morning). I snoozed the reminder until next Friday, just like I have done for the past many years. I can’t call her like our tradition these days because there is no way to call home. All international calls to Iran are blocked, and the internet is fully shut down by the regime.
I got to work and right as I opened my computer I received an email I had scheduled to send to myself 5 years ago: “Apply for citizenship.” This summer marks 11 years of being in the US and 5 years of being a green card holder. I am now eligible to file for citizenship, but it doesn’t matter because an executive order was signed a few months ago that banned all Iranians from applying for any visa or citizenship.
At lunch I opened Twitter just to see what’s up in the world and saw the news that those who don’t have a green card now need to leave the US before they can get one. This means every one of my Iranian friends who are here on a visa now has to go back home (on which flight?) to get a green card??? As if it’s that easy? We all know getting back to the US for Iranians is a huge challenge (months and months of waiting for a visa, with a chance of never being able to come back).
And this is just a normal Friday for an Iranian. These days, when people ask how I’m doing and how I’m handling everything, I just say:
It’s okay, it’s okay. It will be okay some day. But the reality is: nothing is okay. I’m in constant pain. I haven’t seen my family and loved ones in years, I barely hear about their wellbeing, and I’m constantly worried about them. I’m just burying myself in work because that’s the only distraction that can save me from losing my mind.
I’m not okay. None of us are okay. We are just barely holding it together…
Are they trying to destroy our innovation economy?
We have employees in the US legally applying for permanent residency.
Now they can't work with us while they wait for an answer?
Immigrant founders building companies here applying for green cards have to leave?
Insanity.
Scaling laws are powering AI. It’s time to scale biology.
Today we’re launching the Virtual Biology Initiative to generate the data to unlock scaling laws in biology and build accurate predictive models of the cell.
Digital representations of proteins are already expanding our understanding of life at the molecular level, and accelerating the design of molecules and medicines. Accurate digital representations of the cell could reveal the mechanisms that are responsible for disease, and show how to reverse them.
The protein data bank, and worldwide repositories of protein sequence biodiversity were created through decades of work by the scientific community. The advances in artificial intelligence for proteins would not have been possible without them.
The cell is orders of magnitude more complex, and we will need to create the data in just a few years rather than decades.
This will require a coordinated global effort. We're partnering with Broad, Wellcome Sanger, Arc, Allen, Human Cell Atlas, Human Protein Atlas, NVIDIA, and Renaissance Philanthropy.
Biohub is contributing to this effort as both a funder and a builder. We are developing microscopy to observe millions of cells in living organisms, and cryo-ET to resolve the cell in atomic detail. We're building instruments that expand the range of modalities and parameters that can be simultaneously measured. We’re developing molecular, cellular, and tissue engineering to create models of disease and design interventions.
The data we generate will be available to the worldwide scientific community.
We’re also committing $100M over the next five years to support work beyond Biohub.
We invite other scientific teams and funders to join.
Link: https://t.co/93Nw1QT5iZ
This is awesome to see! TBH I lost the count of the number of papers showing equivariance is not required in generative models in bio/chem since "Swallowing the bitter pill" (https://t.co/lIQtbX68Pf) came out. My mental model is that if what you care about is the marginal distribution and not the "path" that takes you there, you are better off by learning symmetries through augmentation. The really interesting part is that this is true for a large space of problems.
New AI paper from us this week. When my student first showed me his initial findings, I really didn’t know what to make of them. I felt that this was an interesting but curious loophole phenomenon that would shortly be closed. I was very wrong.
https://t.co/H3YIyl01FR
One of the biggest promises of Diffusion LLMs is parallel generation: predicting multiple tokens at once to bypass the sequential bottleneck of autoregressive models.
However, parallel generation comes with a price. For example:
Should the sentence “He is from [MASK] [MASK]” be filled with [New] [York] or [San] [Diego]?
If a diffusion model predicts both at the exact same time, it assumes independence and may produce... [San] [York]. 🤦♂️
We argue this arises from a structural misspecification: models are restricted to fully factorized outputs because parameterizing the full joint distribution would require a prohibitively massive output head.
This is the Factorization Barrier crippling parallel generation. Here is how we broke it with CoDD.
@sedielem Valid point! I guess decoding order invariance seems particularly hard because 1) it scales with sequence length and 2) I can imagine sequences where different decoding orders have v different likelihoods.
r.e equivariance, agree - interesting analogy.
https://t.co/BmGedGCwia
@sedielem Particularly that we really want to optimize the max marginal likelihood over possible orderings (e.g *one* decoding order should explain the data well), but current objectives require all possible orderings to explain the data *equally as well*.
@sedielem I also enjoyed this writeup! Given the depth in your writing on diffusion, i'd be very interested in your opinion on some of the arguments against discrete diffusion in this blog post: https://t.co/5kM9BNmnHY