Congratulations to Professor Bin Zhang, who has been honored as “Committed to Caring”. The C2C program is a student-driven initiative that recognizes outstanding professors who extend this dedication beyond the classroom.
https://t.co/vfZLhH8P2X
Pretraining implicit solvent, or coarse grained, models is difficult, limiting the transferability of the resulting ML force fields. However, Justin just figured out a novel method for pretraining using protein language models. More can be found at: https://t.co/jhE1rWKcF6
Our collaborative paper with @XuhuiHuangChem is now published in Biophy J (https://t.co/B7mU291Dwh). I learned a lot about kinetic modeling via this collaboration, and it can be quite illuminating for understanding chromatin folding!
Thrilled to share our new JCP editorial (https://t.co/3z0nuAB1XQ), co-authored with @tamar_schlick, introducing the special collection “Chromatin Structure and Dynamics: Recent Advancements.”
Huge thanks to all the outstanding contributors!
I am excited to share our latest #preprint: ChromoGen: Diffusion model predicts single-cell chromatin conformations https://t.co/h5FORbb0io
As the title suggests, we achieved de novo prediction of 3D chromatin structures using DNA sequence and ATAC-seq using AI.
Protein Language Model Identifies Disordered, Conserved Motifs Driving Phase Separation
1. This study employs ESM2, a cutting-edge protein language model, to analyze intrinsically disordered regions (IDRs) in proteins, uncovering conserved motifs critical for phase separation and membraneless organelle (MLO) formation.
2. A major finding reveals that IDRs involved in phase separation contain conserved “sticker” residues (e.g., Y, W, F) and “spacer” residues (e.g., G, A, P), forming functional sequence motifs under evolutionary pressure.
3. By predicting mutational constraints, ESM2 accurately identifies conserved, functional residues and motifs without relying on sequence alignments, overcoming a key limitation in studying disordered proteins.
4. Experimental validation shows that many conserved motifs identified by ESM2 are essential for phase separation. Mutations within these motifs disrupt MLO formation, highlighting their biological significance.
5. The study introduces a motif-based framework for understanding the “grammar” of IDRs, emphasizing how conserved motifs integrate structural flexibility with functional specificity.
6. This work bridges computational predictions with experimental biology, advancing our understanding of IDRs in phase separation and offering insights into disease-linked protein dysfunction.
7. ESM2 emerges as a powerful tool for investigating the evolutionary and functional landscapes of disordered proteins, with broad implications for molecular biology and synthetic biology.
@binzmit@yumengzhang99
📜Paper: https://t.co/Y9rAubwjmq
#ProteinScience #Bioinformatics #PhaseSeparation #MachineLearning #IntrinsicallyDisorderedRegions
Check out our recent manuscript that studies the interactions between small molecules and biomolecular condensates with all-atom simulations: https://t.co/G0mZtXXFLN
Our findings shed light on the significant regulatory roles of bio condensate and DNA linker length and help bridge the gap between in vivo and in vitro observations.
Nucleosome condensate and linker DNA alter chromatin folding pathways and rates [Qiu et al, 2024] https://t.co/2MYkfyk67M
▶️residue-level coarse-grained models; non-Markovian dynamics
▶️10n bp DNA linker lengths favor zigzag fibrils
▶️10n+5 bp chromatin loses unique conformations
Scaling Graph Neural Networks to Large Proteins
1. This paper introduces the DISPEF dataset, specifically designed for benchmarking graph neural networks (GNNs) on large, biologically relevant proteins. DISPEF contains over 200,000 proteins with implicit solvation free energies, a key challenge for molecular modeling.
2. A major innovation is the introduction of a multiscale architecture called “Schake,” which enhances GNN performance on large proteins by incorporating both short-range and long-range interactions, ensuring computational efficiency and transferability across protein sizes.
3. The Schake model leverages two types of message-passing layers: a more accurate SAKE layer for short-range atomic environments and a more efficient SchNet layer for long-range alpha carbon interactions. This mixed design significantly improves both accuracy and speed.
4. DISPEF provides a diverse chemical environment, including both folded and disordered protein regions, making it an ideal dataset for testing GNNs. The inclusion of many-body solvation free energies as targets pushes the limits of model accuracy and generalization.
5. Benchmark results show that Schake consistently outperforms existing GNNs on energy and force predictions for large proteins, while maintaining superior computational efficiency, reducing inference times by up to 2.7× compared to prior models.
6. A key insight from the study is that increasing the cutoff distance in GNNs improves model transferability to larger proteins, but this needs to be balanced against computational cost, which Schake addresses with its hybrid architecture.
7. This work highlights the importance of datasets like DISPEF for driving future GNN innovations and optimizing models for real-world applications like protein folding and drug discovery, where large proteins are common.
8. The paper provides valuable insights for advancing GNN architectures in computational biology, emphasizing the need for both accuracy and efficiency in handling the vast complexity of biomolecular systems.
@binzmit
💻Code: https://t.co/2OA3G9VyLj
📜Paper: https://t.co/Y3Pc0Pkovk
@Ella_Maru Great question! ChromoGen was trained with only GM12878 data, and we showed that the prediction results for IMR90 cells were equally accurate.
I am excited to share our latest #preprint: ChromoGen: Diffusion model predicts single-cell chromatin conformations https://t.co/h5FORbb0io
As the title suggests, we achieved de novo prediction of 3D chromatin structures using DNA sequence and ATAC-seq using AI.
Our explicit ion paper is now online at Elife: https://t.co/4AlISHvEIL. But who has time to read anyway? Luckily, @XingchengLin made an excellent video that you can watch at: https://t.co/PDwLh0N5JT
It's as if the very fabric of chromatin's existence is intricately woven through these inherent interactions, shaping its three-dimensional structure.
https://t.co/Wl7MQsOv52
Our review on chromatin organization is online at the Annual Review of Biophysics! Spoiler alert, we took a more physical chemist's perspective on the problem.
https://t.co/5g7m9CoXOk
Interested in IDPs and biomolecular condensates? Do you ever wonder why IDPs adopt low-complexity domains? How do you map a given protein sequence into stickers and spacers? We attempt to address these questions in our latest preprint. https://t.co/eejltGAunx
A fruitful collaboration with Bin Zhang @binzmit reveals the nanometer scale interfacial environment of phase separated condensates. In @eLife: Frustrated Microphase Separation Produces Interfacial Environment within Biological Condensates https://t.co/WUXuf79Nkw