Chengfei Yan

@ChengfeiYan

Associate Professor at School of Physics, Huazhong University of Science and Technology, work on Bioinformatics

Wuhan

Joined July 2016

192 Following

44 Followers

103 Posts

Chengfei Yan @ChengfeiYan

2 months ago

Dissecting the Black Box of AlphaFold in Protein-Protein Complex Assembly — explained simply via @gistdotscience https://t.co/QVvOS4ceCO

130

ChengfeiYan retweeted

しんしあ@バイオテクコミュニティ「BioSpace」モデレーター

@BioSpace9

2 months ago

Dissecting the Black Box of AlphaFold in Protein-Protein Complex Assembly https://t.co/ounVC6Gppt この研究は、AlphaFoldがどのようにタンパク質複合体の構造を予測しているのかを理解することを目指しています。特に、従来重要と考えられていた共進化情報ではなく、単体タンパク質の形状や界面の形の相補性が重要であるという視点で解析が行われています。モデル内部の情報伝播を可視化することで、まず単体構造が決まり、その後に相互作用が推定されるという段階的な推論プロセスが示されています。この理解は、protein designにおいて複合体を設計する際に、どの要素を重視すべきかを考える上で重要な示唆を与えます。

ChengfeiYan retweeted

Tess @Tess_xdxdxd

2 months ago

AlphaFold doesn't decode the evolutionary dialogue between proteins — it masters the geometry of each monomer and the precise pattern-matching at their interface. The 'black box' was never magic; it was always physics and sequence complementarity in disguise.

ChengfeiYan retweeted

Biology+AI Daily @BiologyAIDaily

2 months ago

Dissecting the Black Box of AlphaFold in Protein–Protein Complex Assembly 1. Li, Mu, and Yan present evidence that inter-protein coevolution (the usual explanation for AlphaFold complex success) is not the dominant driver of complex assembly in AlphaFold-Multimer or AlphaFold3; instead, assembly largely follows from monomer geometry plus interface-level matching. 2. A time-segregated benchmark (PDB 2022-01-01 to 2024-12-20; training cutoff 2021-09-30) is built to reduce leakage: 200 homodimers and 316 heterodimers, evaluated mainly with DockQ across AFM and AF3. 3. Controlled MSA experiments separate “pairing” from “having MSAs”: AFM Paired MSA vs Block MSA (no cross-chain pairing) vs Randomly Paired MSA. Mean DockQ changes are minimal across these conditions, implying explicit paired-MSA coevolution contributes little for most targets. 4. The paper further removes potential “latent” inter-protein coevolution in unpaired MSAs by regenerating UniRef100 MSAs with species annotations and enforcing zero species overlap between partner MSAs; AFM/AF3 performance remains essentially unchanged, arguing against hidden species-level coevolution being a key signal. 5. The proposed mechanism: AlphaFold first establishes strong intra-chain geometric constraints (monomer folding/geometry), then infers inter-chain constraints downstream via geometric compatibility and interface sequence pattern matching; cross-chain organization is progressively refined through layers and recycling. 6. Template-driven tests support the geometry-first view: supplying high-quality bound-state monomer templates enables complex prediction accuracy comparable to MSA-based runs, and experimentally determined bound monomer templates perform even better; adding MSAs on top of such templates yields little additional gain. 7. A key nuance is “bound vs unbound” monomer geometry: predicted unbound monomer templates degrade complex accuracy, and the difference is concentrated at interface regions. Interface TM-score correlates with complex DockQ (reported Pearson r ≈ 0.575), highlighting interface conformation as a main determinant. 8. Interface residue identity is essential, not just backbone shape: mutating up to 10% of residues to glycine shows that interface mutations nearly abolish prediction accuracy under both MSA-based and template-based settings, while non-interface mutations have only moderate effects—consistent with a backbone+sidechain “pattern matching” interface recognition. 9. The paper introduces AlphaFold-Constraint Propagation Mapping (AF-CPM), using OpenFold to extract Evoformer-layer pair representations and convert them (via the distogram head) into layer-wise contact probability maps (<12 Å). These visualizations show intra-chain constraints forming before inter-chain contacts, directly supporting hierarchical constraint formation. 10. For antigen–antibody complexes (154 nonredundant cases), paired MSAs still do not help; bound-state monomer templates help most. The limiting factor is attributed to immune-interface plasticity and atypical interface statistics (e.g., enrichment of Tyr/Trp on the antibody side), with CDR-H3 local accuracy strongly linked to docking success; AF-CPM suggests antigen–antibody assembly may require more recycling to converge as interface constraints emerge late. 💻Code: https://t.co/pBc61UGRk1 📜Paper: https://t.co/xJkOlXmdbD #AlphaFold #AlphaFoldMultimer #AlphaFold3 #ProteinComplexes #ProteinStructure #MSA #Interpretability #AntibodyEngineering #ComputationalBiology #StructuralBiology

BiologyAIDaily's tweet photo. Dissecting the Black Box of AlphaFold in Protein–Protein Complex Assembly

1. Li, Mu, and Yan present evidence that inter-protein coevolution (the usual explanation for AlphaFold complex success) is not the dominant driver of complex assembly in AlphaFold-Multimer or AlphaFold3; instead, assembly largely follows from monomer geometry plus interface-level matching.

2. A time-segregated benchmark (PDB 2022-01-01 to 2024-12-20; training cutoff 2021-09-30) is built to reduce leakage: 200 homodimers and 316 heterodimers, evaluated mainly with DockQ across AFM and AF3.

3. Controlled MSA experiments separate “pairing” from “having MSAs”: AFM Paired MSA vs Block MSA (no cross-chain pairing) vs Randomly Paired MSA. Mean DockQ changes are minimal across these conditions, implying explicit paired-MSA coevolution contributes little for most targets.

4. The paper further removes potential “latent” inter-protein coevolution in unpaired MSAs by regenerating UniRef100 MSAs with species annotations and enforcing zero species overlap between partner MSAs; AFM/AF3 performance remains essentially unchanged, arguing against hidden species-level coevolution being a key signal.

5. The proposed mechanism: AlphaFold first establishes strong intra-chain geometric constraints (monomer folding/geometry), then infers inter-chain constraints downstream via geometric compatibility and interface sequence pattern matching; cross-chain organization is progressively refined through layers and recycling.

6. Template-driven tests support the geometry-first view: supplying high-quality bound-state monomer templates enables complex prediction accuracy comparable to MSA-based runs, and experimentally determined bound monomer templates perform even better; adding MSAs on top of such templates yields little additional gain.

7. A key nuance is “bound vs unbound” monomer geometry: predicted unbound monomer templates degrade complex accuracy, and the difference is concentrated at interface regions. Interface TM-score correlates with complex DockQ (reported Pearson r ≈ 0.575), highlighting interface conformation as a main determinant.

8. Interface residue identity is essential, not just backbone shape: mutating up to 10% of residues to glycine shows that interface mutations nearly abolish prediction accuracy under both MSA-based and template-based settings, while non-interface mutations have only moderate effects—consistent with a backbone+sidechain “pattern matching” interface recognition.

9. The paper introduces AlphaFold-Constraint Propagation Mapping (AF-CPM), using OpenFold to extract Evoformer-layer pair representations and convert them (via the distogram head) into layer-wise contact probability maps (<12 Å). These visualizations show intra-chain constraints forming before inter-chain contacts, directly supporting hierarchical constraint formation.

10. For antigen–antibody complexes (154 nonredundant cases), paired MSAs still do not help; bound-state monomer templates help most. The limiting factor is attributed to immune-interface plasticity and atypical interface statistics (e.g., enrichment of Tyr/Trp on the antibody side), with CDR-H3 local accuracy strongly linked to docking success; AF-CPM suggests antigen–antibody assembly may require more recycling to converge as interface constraints emerge late.

💻Code: https://t.co/pBc61UGRk1
📜Paper: https://t.co/xJkOlXmdbD
#AlphaFold #AlphaFoldMultimer #AlphaFold3 #ProteinComplexes #ProteinStructure #MSA #Interpretability #AntibodyEngineering #ComputationalBiology #StructuralBiology

Who to follow

Suhn Rhie

@RhieSuhn

Assistant Professor, Keck School of Medicine, University of Southern California #Epigenomics #Genetics #Biology #Cancer #Neurodevelopmental

Gerstein Lab | Yale

@GersteinLab

Research in #Biomedical #DataScience & #Bioinformatics #CompBio. @MarkGerstein AT @YaleMBB @YaleMed

Kevin Yip

@KevinYipLab

Professor, Sanford Burnham Prebys Medical Discovery Institute

ChengfeiYan retweeted

@BioAI_Neuro

@BioAI_Pharma

2 months ago

Dissecting the Black Box of AlphaFold in Protein–Protein Complex Assembly https://t.co/VpyTeEXZip AlphaFold achieves high accuracy in predicting protein–protein complexes, yet the principles of their assembly remain unclear. Here, we present a unified interpretability framework for AlphaFold-Multimer and AlphaFold3 to dissect these mechanisms. We find that inter-protein coevolution is not a major driver; instead, complex formation is largely governed by monomer geometry and interface-level pattern matching, including backbone complementarity and residue interactions. Tracking the propagation of distance constraints during inference reveals a hierarchical process where monomer structures form first, followed by inter-chain interactions. This shows that cross-chain geometry is inferred from monomer features rather than coevolutionary signals. In antigen–antibody complexes, lower accuracy arises from flexible, noncanonical interfaces, highlighting conformational variability and atypical interactions as key challenges for improving immune complex prediction. #AlphaFold3 #BioAI #AlphaFoldMultimer #MSA #AntibodyEngineering #StructuralBiology

BioAI_Pharma's tweet photo. Dissecting the Black Box of AlphaFold in Protein–Protein Complex Assembly
https://t.co/VpyTeEXZip
AlphaFold achieves high accuracy in predicting protein–protein complexes, yet the principles of their assembly remain unclear. Here, we present a unified interpretability framework for AlphaFold-Multimer and AlphaFold3 to dissect these mechanisms. We find that inter-protein coevolution is not a major driver; instead, complex formation is largely governed by monomer geometry and interface-level pattern matching, including backbone complementarity and residue interactions. Tracking the propagation of distance constraints during inference reveals a hierarchical process where monomer structures form first, followed by inter-chain interactions. This shows that cross-chain geometry is inferred from monomer features rather than coevolutionary signals. In antigen–antibody complexes, lower accuracy arises from flexible, noncanonical interfaces, highlighting conformational variability and atypical interactions as key challenges for improving immune complex prediction.
#AlphaFold3 #BioAI #AlphaFoldMultimer #MSA #AntibodyEngineering #StructuralBiology

442

ChengfeiYan retweeted

Biology+AI Daily @BiologyAIDaily

2 months ago

Dissecting the Black Box of AlphaFold in Protein–Protein Complex Assembly 1. The study argues that AlphaFold-Multimer and AlphaFold3 usually do not assemble protein complexes primarily from inter-protein coevolution; instead, assembly is largely driven by monomer geometry plus interface-level “pattern matching” between backbone complementarity and residue identities. 2. A strictly time-segregated benchmark was built from PDB entries deposited Jan 2022–Dec 2024 (post-training cutoff Sep 30, 2021): 201 homodimers and 316 heterodimers, evaluated with DockQ to probe what information actually changes complex accuracy. 3. Controlled MSA experiments compared AFM Paired MSA vs Block MSA (no pairing) vs Randomly Paired MSA. Mean DockQ changed minimally across conditions for both AFM and AF3, suggesting explicit MSA pairing contributes little for most complexes. 4. To test “latent” inter-protein coevolution hiding in unpaired MSAs, the authors rebuilt monomer MSAs with explicit species annotations (UniRef100 via JACKHMMER) and enforced zero species overlap between the two partners’ MSAs. Complex accuracy still showed no significant difference, further weakening the coevolution-centric explanation. 5. Template-driven tests support a geometry-first mechanism: providing only sequences plus high-quality monomer templates can recover complex structures. Experimentally determined bound-state monomer templates performed best, and adding MSAs on top of such templates did not further improve performance. 6. The key determinant is interface-region monomer accuracy: bound vs unbound monomer templates differ mainly at interfaces (not non-interface regions), and interface TM-score correlates with complex DockQ (reported Pearson r = 0.576), linking docking success to getting interface conformations right. 7. Sequence identity at the interface is essential, not just backbone shape: random mutations (to glycine) at interface residues nearly abolish complex prediction accuracy under both MSA-based and template-based settings, while non-interface mutations have only moderate effects. 8. The paper introduces AlphaFold-Constraint Propagation Mapping (AF-CPM), an interpretability method that extracts pair representations across Evoformer layers (via OpenFold’s distogram head) to visualize contact probability formation. It shows intra-chain constraints appear first, and only then do inter-chain constraints emerge—consistent with inter-chain geometry being inferred from monomer geometry. 9. For antigen–antibody complexes (154 nonredundant SAbDab-derived cases), MSA pairing again does not help; bound-state monomer templates help most. The main bottleneck is interface plasticity and atypical immune-interface statistics: lower interface-region monomer accuracy on both antigen and antibody sides, with a strong role for CDR-H3, plus residue-contact biases (e.g., enriched Tyr/Trp on antibody interfaces) that may mismatch model priors. 💻Code: https://t.co/pBc61UGRk1 📜Paper: https://t.co/xJkOlXmdbD #AlphaFold #ProteinComplexes #StructuralBiology #ComputationalBiology #DeepLearning #Interpretability #Antibody #ProteinProteinInteractions #Bioinformatics

BiologyAIDaily's tweet photo. Dissecting the Black Box of AlphaFold in Protein–Protein Complex Assembly

1. The study argues that AlphaFold-Multimer and AlphaFold3 usually do not assemble protein complexes primarily from inter-protein coevolution; instead, assembly is largely driven by monomer geometry plus interface-level “pattern matching” between backbone complementarity and residue identities.

2. A strictly time-segregated benchmark was built from PDB entries deposited Jan 2022–Dec 2024 (post-training cutoff Sep 30, 2021): 201 homodimers and 316 heterodimers, evaluated with DockQ to probe what information actually changes complex accuracy.

3. Controlled MSA experiments compared AFM Paired MSA vs Block MSA (no pairing) vs Randomly Paired MSA. Mean DockQ changed minimally across conditions for both AFM and AF3, suggesting explicit MSA pairing contributes little for most complexes.

4. To test “latent” inter-protein coevolution hiding in unpaired MSAs, the authors rebuilt monomer MSAs with explicit species annotations (UniRef100 via JACKHMMER) and enforced zero species overlap between the two partners’ MSAs. Complex accuracy still showed no significant difference, further weakening the coevolution-centric explanation.

5. Template-driven tests support a geometry-first mechanism: providing only sequences plus high-quality monomer templates can recover complex structures. Experimentally determined bound-state monomer templates performed best, and adding MSAs on top of such templates did not further improve performance.

6. The key determinant is interface-region monomer accuracy: bound vs unbound monomer templates differ mainly at interfaces (not non-interface regions), and interface TM-score correlates with complex DockQ (reported Pearson r = 0.576), linking docking success to getting interface conformations right.

7. Sequence identity at the interface is essential, not just backbone shape: random mutations (to glycine) at interface residues nearly abolish complex prediction accuracy under both MSA-based and template-based settings, while non-interface mutations have only moderate effects.

8. The paper introduces AlphaFold-Constraint Propagation Mapping (AF-CPM), an interpretability method that extracts pair representations across Evoformer layers (via OpenFold’s distogram head) to visualize contact probability formation. It shows intra-chain constraints appear first, and only then do inter-chain constraints emerge—consistent with inter-chain geometry being inferred from monomer geometry.

9. For antigen–antibody complexes (154 nonredundant SAbDab-derived cases), MSA pairing again does not help; bound-state monomer templates help most. The main bottleneck is interface plasticity and atypical immune-interface statistics: lower interface-region monomer accuracy on both antigen and antibody sides, with a strong role for CDR-H3, plus residue-contact biases (e.g., enriched Tyr/Trp on antibody interfaces) that may mismatch model priors.

💻Code: https://t.co/pBc61UGRk1
📜Paper: https://t.co/xJkOlXmdbD
#AlphaFold #ProteinComplexes #StructuralBiology #ComputationalBiology #DeepLearning #Interpretability #Antibody #ProteinProteinInteractions #Bioinformatics

Chengfei Yan @ChengfeiYan

2 months ago

We explore AlphaFold's mechanism for multimer prediction in this preprint through hypothesis-driven testing and process visualization. Dissecting the Black Box of AlphaFold in Protein-Protein Complex Assembly https://t.co/IW4N7ybe8u

156

Chengfei Yan @ChengfeiYan

2 months ago

New Preprint from our lab: Dissecting the Black Box of AlphaFold in Protein-Protein Complex Assembly https://t.co/IW4N7ybe8u

156

ChengfeiYan retweeted

bioRxiv Bioinfo @biorxiv_bioinfo

2 months ago

Dissecting the Black Box of AlphaFold in Protein-Protein Complex Assembly https://t.co/4k0CDoj2x9 #biorxiv_bioinfo

610

ChengfeiYan retweeted

Biology+AI Daily @BiologyAIDaily

8 months ago

Mechanism-Aware Protein-Protein Interaction Prediction via Contact-Guided Dual Attention on Protein Language Models 1. A new deep learning framework called PLMDA-PPI is introduced for predicting protein-protein interactions (PPIs) with enhanced generalization ability. This model incorporates residue-level interaction modeling inspired by biophysical principles, which is crucial for accurately predicting interactions between novel proteins that have low sequence similarity to training data. 2. PLMDA-PPI integrates a dual-attention-based PPI prediction module with an inter-protein residue-residue contact predictor. The model is trained on structure-informed PPIs from the Protein Data Bank (PDB), enabling it to predict both interactions and the key residue pairs mediating these interactions. 3. The model demonstrates superior accuracy and robustness compared to existing methods like D-SCRIPT, Topsy-Turvy, TT3D, and TUnA, especially under strict sequence dissimilarity constraints. It also outperforms computationally intensive methods such as AF2Complex, RF2-Lite, and RF2-PPI in terms of predictive performance while requiring significantly fewer computational resources. 4. Fine-tuning on H. sapiens PPI data from the HINT dataset further improves the model's performance across multiple species, showing strong generalization capabilities. The model's ability to identify key residue pairs mediating interactions provides insights into the biophysical mechanisms underlying PPIs. 5. The study highlights the importance of incorporating residue-level interaction patterns to enhance the generalization of deep learning models for PPI prediction. This approach not only improves accuracy but also reduces the reliance on sequence similarity, making it more suitable for predicting interactions in diverse biological contexts. 📜Paper: https://t.co/y5GH4ReQC0 #ProteinInteraction #DeepLearning #Biophysics #ComputationalBiology

BiologyAIDaily's tweet photo. Mechanism-Aware Protein-Protein Interaction Prediction via Contact-Guided Dual Attention on Protein Language Models

1. A new deep learning framework called PLMDA-PPI is introduced for predicting protein-protein interactions (PPIs) with enhanced generalization ability. This model incorporates residue-level interaction modeling inspired by biophysical principles, which is crucial for accurately predicting interactions between novel proteins that have low sequence similarity to training data.

2. PLMDA-PPI integrates a dual-attention-based PPI prediction module with an inter-protein residue-residue contact predictor. The model is trained on structure-informed PPIs from the Protein Data Bank (PDB), enabling it to predict both interactions and the key residue pairs mediating these interactions.

3. The model demonstrates superior accuracy and robustness compared to existing methods like D-SCRIPT, Topsy-Turvy, TT3D, and TUnA, especially under strict sequence dissimilarity constraints. It also outperforms computationally intensive methods such as AF2Complex, RF2-Lite, and RF2-PPI in terms of predictive performance while requiring significantly fewer computational resources.

4. Fine-tuning on H. sapiens PPI data from the HINT dataset further improves the model's performance across multiple species, showing strong generalization capabilities. The model's ability to identify key residue pairs mediating interactions provides insights into the biophysical mechanisms underlying PPIs.

5. The study highlights the importance of incorporating residue-level interaction patterns to enhance the generalization of deep learning models for PPI prediction. This approach not only improves accuracy but also reduces the reliance on sequence similarity, making it more suitable for predicting interactions in diverse biological contexts.

📜Paper: https://t.co/y5GH4ReQC0
#ProteinInteraction #DeepLearning #Biophysics #ComputationalBiology

ChengfeiYan retweeted

Biology+AI Daily @BiologyAIDaily

12 months ago

Mechanism-Aware Protein-Protein Interaction Prediction via Contact-Guided Dual Attention on Protein Language Models 1. A new study introduces PLMDA-PPI, a novel deep learning framework designed to significantly enhance the generalization ability of protein-protein interaction (PPI) prediction. This framework directly addresses a critical limitation of existing models: their struggle to accurately predict interactions involving novel proteins with low sequence similarity to training data. 2. The core innovation of PLMDA-PPI lies in its "mechanism-aware" approach. It integrates a dual-attention-based PPI prediction module with an inter-protein contact predictor that utilizes protein language model-embedded geometric graphs. This allows the model to not only predict whether proteins interact but also identify the specific residue pairs mediating these interactions. 3. Through joint training on structure-informed PPIs from the Protein Data Bank (PDB), the model learns both interaction probabilities and the key residue contacts. This dual-task learning contributes significantly to its robust performance and interpretability. 4. Extensive evaluations demonstrate that PLMDA-PPI achieves superior accuracy and robustness compared to state-of-the-art models like D-SCRIPT, Topsy-Turvy, and TT3D, particularly under stringent sequence dissimilarity settings where other models typically fail. 5. The model shows strong generalization capabilities across various species in the high-quality interactome dataset (HINT). Fine-tuning on human PPI data from HINT further improves its predictive performance, showcasing its adaptability and potential for practical applications. 6. This work provides a generalizable framework for biomolecular interaction prediction by effectively integrating both structural and non-structural interaction data. Its impact extends beyond PPIs, offering an adaptable paradigm for predicting other types of biomolecular interactions. 💻Code: https://t.co/GEahfSHv4N 📜Paper: https://t.co/bzB50wxxcE #ComputationalBiology #ProteinPrediction #DeepLearning #Bioinformatics #PPI #MachineLearning

Chengfei Yan @ChengfeiYan

12 months ago

New work from our group for PPI prediction: Mechanism-Aware Protein-Protein Interaction Prediction via Contact-Guided Dual Attention on Protein Language Models https://t.co/rHWN42knD8

100

Chengfei Yan @ChengfeiYan

about 2 years ago

In @eLife: Protein language model-embedded geometric graphs power inter-protein contact prediction https://t.co/ZePuEtLH7q The final version of our work！

519

Chengfei Yan @ChengfeiYan

over 2 years ago

Our work has been online in eLife (reviewed preprint)! In @eLife: Protein language model embedded geometric graphs power inter-protein contact prediction https://t.co/YOaWyCeRP6

265

ChengfeiYan retweeted

bioRxiv @biorxivpreprint

over 2 years ago

#PeerReview from @eLife of 👉🏿 Protein language model embedded geometric graphs power inter-protein contact prediction https://t.co/BknmCAi5OM #biorxiv #reviewedpreprint

Chengfei Yan @ChengfeiYan

over 3 years ago

DRN-1D2D_Inter: our work for inter-protein contact prediction from sequences of interacting monomers is online in Briefings in Bioinformatics. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models https://t.co/K8CKrtql7O

225

Chengfei Yan @ChengfeiYan

over 3 years ago

New manuscript from our group: PLMGraph-Inter：https://t.co/Qt6zqyd6t1 Protein language model embedded geometric graphs power inter-protein contact prediction https://t.co/NO3CLKjIoe

253

ChengfeiYan retweeted

bioRxiv Bioinfo @biorxiv_bioinfo

over 3 years ago

Protein language model embedded geometric graphs power inter-protein contact prediction https://t.co/0e2XVNftAc #biorxiv_bioinfo

Chengfei Yan @ChengfeiYan

almost 4 years ago

Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models https://t.co/t0HCGNDgYP New manuscript from our lab!

ChengfeiYan retweeted

bioRxiv @biorxivpreprint

almost 4 years ago

Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models https://t.co/Y92rrm7iLx #bioRxiv

Chengfei Yan

@ChengfeiYan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users