Just want to give a shout-out to David Kelley @drklly who I think often does not get the credit he deserves (outside our core community).
I want to highlight why I think he is such a fantastic scientist and leader in regulatory genomics. 1/
AlphaGenome is out in @nature today along with model weights! 🧬
📄 Paper: https://t.co/1fHzSPiY1x
💻 Weights: https://t.co/z6JWLT4Mpv
Getting here wasn’t a straight path. We sat down @googledeepmind to discuss the story behind the model, paper & API: https://t.co/cT8CiXfnxQ
Excited to share Nona: a unifying multimodal masking framework for functional genomics.
Models for DNA have evolved along separate paths: sequence-to-function (AlphaGenome), language models (Evo2), and generative models (DDSM).
Can these be unified under a single paradigm? 1/15
I'm excited that our paper, "scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution," has been published in Nature Methods! scooby is a new deep-learning framework to understand how DNA sequence shapes gene expression in individual, single cells.
Big thanks to co-authors @AlexKarollus & @gagneurlab , and to Borzoi authors @jjohlin@drklly et al!
Flashzoi is available on github and on pip: https://t.co/Yd033TXBVB.
Happy to share that Flashzoi is now published!
We enhanced Borzoi with RoPE & FlashAttention for >3x faster training/inference & 2.4x reduction in memory usage.
This brings large-scale genomic analysis and fine-tuning within reach of academic budgets.
📄: https://t.co/Llyt6k0YLK
The speedup does not come at the cost of accuracy:
Flashzoi matches or improves upon Borzoi’s performance across benchmarks, including RNA-seq coverage prediction, variant effect prediction (GTEx eQTLs), and enhancer-promoter linking.
Recent performance prediction models have made great advances in accurately predicting the execution time of database queries. See how we achieved the same accuracy with a 10,000x faster model, T3.
We are happy to present our work on T3 at SIGMOD'25.
https://t.co/sxAWQzvs2F
@i000@missing_a_rib@anshulkundaje@jmschreiber91 We only load the weights for the convs; we completely re-initialize all transformer weights (see Fig 1b). While we managed to train a Borzoi from scratch w similar performance on held out seqs/ VEP, a Flashzoi from scratch w same training hparams was subpar. Rotary is the diff
1/ DNA sequence models like Borzoi predict gene expression and variant effects across 1000s of tissues — but what if your data comes from a custom experiment? @drklly@jjohlin and I propose a lightweight solution: parameter-efficient fine-tuning (PEFT). https://t.co/h4WuoIofWL
Can DNA sequence models predict mutations affecting human traits?
We introduce TraitGym, a curated benchmark of causal regulatory variants for 113 Mendelian & 83 complex traits, and evaluate functional genomics and DNA language models. Joint work w/ @gokcen and @yun_s_song 🧵👇
The Borzoi manuscript is now out in Nature Genetics: https://t.co/e6WcfztXx3
Borzoi predicts RNA-seq profiles in many tissues & cell types from DNA sequence as its only input. With it, we can score the impact of genetic variants on a number of gene-regulatory functions. 1/
On GTEx eQTLs, Flashzoi showed significantly improved correlation with observed variant effects and maintained Borzoi's performance in eQTL prioritization. Explore more results in the paper, find the code at https://t.co/uqE8TP5OYC, and models at https://t.co/WyEahDsfX9.
Flashzoi offers a slight but consistent improvement in the accuracy of genomic profile predictions compared to Borzoi, while delivering speed gains of up to 3x for both prediction (inference) and training or fine-tuning.
Introducing Flashzoi⚡! We’ve upgraded the Borzoi model with rotary pos. encodings and FlashAttention, resulting in a 3x speedup with similar or better accuracy for faster variant effect prediction or model development, and more efficient genomic analysis https://t.co/yIGRWHUS3b