🧬 😴 TIRED: Scaling protein models to billions of parameters hoping they'll memorize all of evolution and generalize beyond
🔥 WIRED: Smart retrieval-augmented models that dynamically access what they need from sequence databases
🚨ICML Paper Alert🚨
What if finding the right protein homologs wasn't a slow search, but a learned part of the model itself?
We introduce 𝐏𝐫𝐨𝐭𝐫𝐢𝐞𝐯𝐞𝐫, an end-to-end framework that learns to retrieve the most useful homologs for self-supervised reconstruction! (1/12)
Quite pleased to hear that one of my submitted proteins placed #5 out of 1200 that were tested in this lovely competition. The approach: pure rational design. Nice to see that human + Microsoft Word is still competitive with state-of-the-art AI methods 🙂
Super excited to share that I will join @Princeton@omenndarlingbio this Fall as an assistant professor. We combine syn bio and AI to program macromolecules and cells for therapeutics engineering and drug target discovery. https://t.co/2afvmpEJ3A
New paper “Proteome-wide model for human disease genetics” is now live at Nature Genetics: https://t.co/3UKcPlepDV
popEVE (https://t.co/HuxeGfe0g0) finds the needles in the haystacks of human genetic variation:
@youngsuko9@AllThingsApx You both may find this work interesting -- better performance than all the aforementioned models on ProteinGym, and retrieval 2 orders of magnitude faster vs mmseq-GPU: https://t.co/fkbBIndnoU
Reminder - PhD applications for OATML are now open
The first funding deadline is December 2 - candidates interested in developing Bayesian deep learning methodology, applications of ML, AI security, and understanding ML methodology are encouraged to apply
More info: https://t.co/g99j1So31R
Thrilled to announce I'm starting as a Principal Investigator at #Aithyra in Vienna! We'll be developing generative models to understand cell biology and design proteins.
I'm hiring PhDs, Postdocs, & Visiting Researchers! PhD applications by Sept 10: https://t.co/jYG60pfPX6
@houchao1 Congrats @houchao1 ! Wondering if NLL -- as a measure of how a given protein is constrained -- could be used as a quick diagnostic to assess which protein is "easier to evolve" in protein engineering campaigns
@BorisMPower@OpenAI Congrats @BorisMPower and team! Would love to test out the approach on DMS assays in ProteinGym (https://t.co/tEfXut1tR6). Do reach out if interested!
I’ve thoroughly enjoyed reading two (VERY!) recent papers that model protein sequences by retrieving evolutionary information (dynamically) at inference time, and there's a lot to unpack!
[1] https://t.co/NWzDzvYALu
[2] https://t.co/H4tWxZwScl
(1/n)
1/5 Biological data is noisy, redundant, and ever-growing. 🗣️
In our new paper (first paper of my post doc!! ⚡️), we track model performance across 14 years of UniRef100 snapshots to ask: how does pLM performance scale with training data?
Congratulations to the entire @ProfluentAI team on this incredible milestone! OpenCRISPR-1 represents a paradigm shift - the first AI-designed CRISPR protein to successfully edit human DNA with fewer off-target effects. We're moving from discovery-based to engineered biology. 🧬
Excited to have our AI research published in @Nature today. Proud of the @ProfluentBio team and the extensive final version available under open-access.
OpenCRISPR is a milestone. It's the first successful demonstration of editing the human genome with a molecule fully designed by AI (bonus: we open-sourced it). It's also a ML case study in engineering functional biological systems that extend beyond nature for real societal needs.
The broad adoption of OpenCRISPR has fueled us to build an incredible platform. We are the one-stop shop to enable any type of precise genome engineering. Through partnership, our collaborators leverage frontier AI to cure disease, build personalized medicines, and solve fundamental societal challenges.
Some selected highlights on new scientific results below 👇
My @Nature News & Views on this breakthrough: https://t.co/QghloVQCyK
Special thanks to @AvivSpinner for their valuable feedback, and to @nature and @NatureNV for the support in writing this piece! 🙏
AI expands the repertoire of CRISPR-associated proteins for genome editing
@NatureNV preview by @NotinPascal
https://t.co/FgBiemIy8n
@thisismadani https://t.co/79ihnoXQfU @jeffruffolo@AadyotB et al
https://t.co/Bq2nmEP35V
1/4
🚀 Announcing the 2025 Protein Engineering Tournament. This year’s challenge: design PETase enzymes, which degrade the type of plastic in bottles. Can AI-guided protein design help solve the climate crisis? Let’s find out! ⬇️
#AIforBiology#ClimateTech#ProteinEngineering #OpenScience
Save the date! Machine Learning for Drug Discovery (MLDD) is happening soon on Monday 30 June, 2025.
MLDD aims to bring together ML for drug discovery experts, innovators, and enthusiasts from the machine learning, biotechnology and drug discovery domains in London, UK to converge, exchange ideas, and forge new paths in revolutionizing therapeutic development.
This year, we are very fortunate to feature keynotes by:
- Charlotte Brunne (EPFL)
- John Chodera (Achira / MSKCC)
- Kexin Huang (Stanford University)
- Jacob Kimmel (NewLimit)
- Marinka Zitnik (Harvard University)
To accommodate the worldwide community, the event will be held fully remotely this year - a link to join and further information are provided at the MLDD website (link in thread below!)
Looking forward to seeing you there!