It was super fun working on Evo 2, a DNA language model trained on genomes across the tree of life!
Check out the preprint: https://t.co/1bDjCsg4Sr
A small 🧵 highlighting some mechanistic interpretability work on Evo 2 (Fig. 4) we did in collaboration with @GoodfireAI 🔥🔥🔥
Our lab is proud to present our latest work harnessing Bridge Recombinase for genome-scale editing in diverse bacteria, microbiome editing, and programmable horizontal gene transfer.
Excited to share our discovery of a new programmable RNA-guided DNA-targeting system hiding inside bacteriophages that predates CRISPR.
We call it VIPR (Viral Interference Programmable Repeat), and it uses an entirely new logic to find its targets.
Thread + link below.
🧬Ultra fast petabase-scale virus discovery is here! 🚀
In our lab’s newest preprint, Jess explores the biodiversity of Papillomaviruses (PVs) in record time
🧵1/5 👇
🔗: https://t.co/newJ2GRsQS
Myloasm, our long-read metagenome assembler, is now published! w/ Max Marin & @lh3lh3
Very rewarding after > a year of development and countless hours thinking about assembly. Thanks to beta testers, Li lab, and reviewers for helpful feedback.
Link: https://t.co/Hjr8hHiupP
Pleased to announce that CellVoyager is published @naturemethods!
CellVoyager is a scRNA-seq AI agent that autonomously generates hypotheses and tests them in a live analysis notebook, where users can guide the discovery process.
Demo: https://t.co/J7M7j15xih
What's new 🧵⤵️
Thrilled to announce alphagenome-pytorch, an accurate, readable, and careful port of AlphaGenome's architecture and weights to PyTorch. Work with @gtcaa@m_kjellberg@chriswzou@tuxinming as part of the GenomicsxAI initiative between @anshulkundaje and @pkoo562 labs.
Our 2 papers on RNA-guided transcription are now out @Nature. This mechanism re-writes the traditional concept of bacterial transcription and allows RNA transcripts to be generated de novo from potentially any cellular DNA sequence. 🧬 See below for links and thread 🧵
To make Evo 2 more accessible, we're releasing Evo 2 20B, a checkpoint that achieves 40B-level performance on a single H100, as a drop-in replacement. This came out of model surgery with @danielchang2002, and we are excited to see people build on it!
https://t.co/YVPEFIBAUA
Evo 2 is out in Nature today, showing that genome language models can predict and design across the full complexity of life, from phages to eukaryotes.
A few surprises from the project, including how ignoring trillions of nucleotides was key to getting a good model. 🧵
The hardest part of protein engineering isn't just finding good mutations – it’s deciphering which ones combine synergistically.
Today in @ScienceMagazine, we present MULTI-evolve, a framework for rapid multi-mutant protein engineering, validated across three diverse proteins.
SAEs fail even when the Linear Representation Hypothesis holds perfectly.
We built SynthSAEBench: large-scale synthetic data with 16k ground-truth features, correlation, hierarchy, and superposition. We trained 5 SAE architectures on it.
None achieve perfect feature recovery.
New work out from Bhatt lab led by Jakob Wirbel and @Angela_Hickey98 on uncovering gut prophage biology via long-read metagenomics!
https://t.co/Z1L2EuKAFD
A small 🧵 highlighting the tale of IScream🍦phage (Fig. 4) which I was lucky to work on while rotating in the Bhatt lab!
What if we could autocomplete DNA based on function?
Today in @Nature, we share semantic design—a strategy for function-guided design with genomic language models that leverages genomic context to create de novo genes with desired functions.🧵
https://t.co/P5qVJB3qIY
I am so excited to share our project with you! We find prokaryotic proteases activate toxic enzymes and pores as a modular strategy in phage defense. We studied four fascinating protease-toxin pairs that are abundant across bacterial genomes:
Many thanks to our wonderful collaborators and to the Gao lab for making this work possible!
https://t.co/9BgGXflsfi
We are actively recruiting for two positions at the interface between biology and generative design. Backgrounds of particular interest are in protein biochemistry/evolution and synthetic genomics/biology.
Please consider joining us! 1/n
Closing the AI-to-lab loop is hard, especially if you want to test your WHOLE GENOME generator..
Viruses are the only genomes cheap enough to print en mass, but raise biosafety flags
So @ArcInstitute chose phages!
We went deep w/ @samuelhking & @driscoll_cl on how they did it:
We're thrilled to announce SeqHub, an AI-enabled platform for biological sequence analysis. SeqHub brings together sequence search, genome annotation, and data sharing in one place.
I dreamed of a single place where I could learn everything about my sequences. Today, a much more refined version of this dream takes form with https://t.co/1l5O5cP3tE, built by an incredible team at @tatta_bio. Our goal is to make sequence interpretation more intuitive and collaborative for everyone working with biological sequences.
Currently, SeqHub is optimized for microbial protein and genome analysis. As we expand beyond microbial data, we'd love your feedback to help shape what comes next. I'm deeply grateful to our team at Tatta Bio, and to our collaborators and funders, for making this vision a reality.
Check it out at https://t.co/vXbVqe507X!