Can we simulate realistic evolutionary trajectories and “replay the tape of life”? In this work, we propose a flexible, generalizable framework for modeling how the entire protein seq evolves over time while capturing complex interactions across sites. 1/n
https://t.co/DIwQ40C478
We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
https://t.co/FTm3byYp67
(1/n)
Happy New Year! Our GPN-MSA paper is finally published, under a slightly different title from the preprint. Please check it out and share it with your colleagues.
https://t.co/CKvTG2EZS2
1/4
It was recently reported that Africans experienced a sharp, severe bottleneck around 900kya and that this signal is absent in non-Africans. Here, we present evidence to show why this is likely a statistical artifact:
@ras_nielsen@yun_s_song
https://t.co/wH6p5VJrBl 1/4
We were invited to write a review article on DNA/genomic language models (gLMs). We took this occasion to gather our thoughts on promising applications, and major considerations for developing and evaluating gLMs. Pls share with your colleagues: Preprint: https://t.co/iUjl874VoG
We recently posted a preprint describing GPN-MSA, a DNA language model that leverages whole-genome alignments across multiple species while taking only a few hours to train. This thread summarizes its performance on the human genome.
https://t.co/QyRBRqOXAX
1/12
After lengthy anticipation, we finally got to compare PrimateAI-3D with our Cross-Protein Transfer (CPT) model https://t.co/Sop1kUF8u6
Below is a summary of our findings, including a comparison of ESM-1b and ESM-1v protein language models.
1/8
everyone is trying to come up with a copy of twitter when we actually want a copy of 2007-2012 niche hobby blogs with a Google RSS reader that aggregated them into one feed
Very glad to see our CherryML method published online in @naturemethods. The published version has updated and expanded results, so please check it out. https://t.co/a8nebdrspa
(1/3)
If you are interested in DNA language models and their potential applications in genome-wide variant effect prediction, please check out our updated GPN paper (with @gsbenegas and @sanjitsbatra):
https://t.co/0ZkhBvMhUa
A summary of updates follows.
1/5
Interested in a fast, accurate method for estimating the parameters of a complex phylogenetic model of molecular evolution (e.g. a general rate matrix describing the co-evolution of protein contact sites in 3D)? If so, please check out our preprint: https://t.co/VnuGnBjByS
(1/10)
Looking to benchmark a model-guided protein design method? Running into the limitations of existing datasets? Excited to present SLIP (Synthetic Landscape Inference for Proteins), a suite of tuned fitness landscapes.
Preprint: https://t.co/6W1bCDCcnT
Predicting the effects of missense variants is a central problem in human genome interpretation. We are thrilled to share our preprint on using cross-protein transfer (CPT) learning to improve zero-shot prediction of disease variant effects:
https://t.co/Q4JLkLwNh8
(1/8)