We start the year with a new preprint examining DNA vertical ionization potentials, oxidative DNA damages and their relationship to somatic mutations in cancer and normal tissues. Have a look.
https://t.co/OjL8kmf3wE
Our AbAgym dataset, a collection of DMS data on Antibody-Antigen complexes is finally out in mAbs https://t.co/rxgLb1080j Please check also the associated GitHub repository at https://t.co/cEdOLXCz97
The prediction season for the 7th “Critical Assessment of Genome Interpretation” challenges is ongoing. Have a look and submit your prediction models!! https://t.co/2J1oub2hwc @pedjagogue@Steven_Brenner@CAGInews
@themax82@CardamoneIsabel@AurelianoStingi é un vaccino perché funziona lo stesso modo, si somministra un frammento di KRAS mutata (pezzo di proteina) per indurre la risposta immunitaria. @AurelianoStingi non é un mRNA ma peptide-vaccino. C’è un vaccino mRNA pancreas ed é in fase 2, fatto dal Balachandra Lab at MSKCC
SOuLMuSiC, our new tool to predict how mutations affect protein solubility is finally out! @SimoAttanasio@RoomanMarianne@BrusselsBioInfo
https://t.co/HY06Tvx2mK
You can run it at https://t.co/0BZaqgNRFs
Here is the AbAgym dataset, which includes over 330,000 mutations affecting antibody–antigen complexes formation that we have recently curated!
https://t.co/kLtpy8RG7n
AbAgym: A Well-Curated Dataset For The Mutational Analysis Of Antibody-Antigen Complexes
A new dataset called AbAgym has been introduced, which is specifically designed for the mutational analysis of antibody-antigen complexes. This dataset is manually curated and contains approximately 335,000 mutations, including about 37,361 interface mutations, with experimentally quantified effects on antibody-antigen binding.
AbAgym addresses the lack of comprehensive, well-curated datasets for training computational models in antibody design, despite the abundance of mutagenesis data on antibody-antigen interactions.
The dataset was constructed by collecting and curating 67 deep mutational scanning (DMS) datasets from scientific literature, along with the 3D structures of each antibody-antigen complex. When exact sequences were not available in the Protein Data Bank (PDB), homology models were used.
The data processing pipeline involved collecting and formatting DMS data, processing PDB structures (including remodeling with Swiss-Model and energy minimization with GROMACS), and identifying binding interfaces using a distance-based criterion (atoms within ≤6 Å).
AbAgym includes a variety of antigen types, such as SARS-CoV-2 spike protein, HIV-1 envelope protein, lysozyme, nerve growth factors, and more.
The study benchmarked established force field methods and recent machine learning models for predicting changes in binding affinity upon mutation. Force field methods showed modest performance, while machine learning models performed only marginally better than random, suggesting overtraining and poor generalization in the latter.
Analysis of hotspot residues responsible for immune evasion highlighted the importance of accounting for biological complexities like conformational changes and oligomeric states, which are often overlooked but significantly influence antibody-antigen binding.
AbAgym is freely available for academic use on GitHub, including data and metadata tables, and all modeled PDB files.
💻Code: https://t.co/Vz0EJknl3w
📜Paper: https://t.co/9dpx1thu3O
#AbAgym #ComputationalBiology #AntibodyDesign #MutationalAnalysis #Bioinformatics
@J33P4@Google@GoogleDeepMind Pharmaceutical corporations have never invested in the PDB, despite being asked and despite the fact that they profit off the data it contains and the huge amount of effort the PDB puts into curating that data.
Maybe @Google or @GoogleDeepMind should fund CASP, given their ample resources and how much they've benefited from the efforts of John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Nick Grishin, and many others.
https://t.co/beRhc7pkPt
Our RSALOR paper is out in Bioinformatics!!! Have a look on how a very simple model reaches the same accuracy of complex AI architectures in predicting protein mutational effects @mariannerooman@matsveitsishyn@ULBruxelles
https://t.co/Zu9qgPZaLX
https://t.co/cq3bFrMjiH
At 15, I left all my family and friends and came to the US , ALONE, seeking a country where MERIT and HARD WORK mattered-not politics. For 30 years, I worked tirelessly, even doing reseach at MIT the day my mother died, knowing she’d want me to keep pushing forward. Today, my grants-and my students’-earned through fair competition, have been terminated for political reasons. The America that once rewarded merit now feels no different from the country I fled.
Just got news that all of my NIH and NSF grants were officially terminated today. Have been expecting this for some time, so it's not surprising, but still jarring to see the news.
Have heard similar news from dozens of colleagues. Will be hard, but we'll find a way forward.
Have we hit a "scaling wall" for protein language models? 🤔 Our latest ProteinGym v1.3 release suggests that for zero-shot fitness prediction, simply making pLMs bigger isn't better beyond 1-4B parameters. The winning strategy? Combining MSAs & structure in multimodal models!
How to predict the impact of mutations on protein solubility and design more soluble proteins? Check out our computational tool SOuLMuSiC. Preprint available at https://t.co/mJO3lwfoaz
Freely available at https://t.co/4WLnTqUl2J
@SimoAttanasio@mariannerooman@BrusselsBioInfo