MedSAGE: Bridging Generative AI and Medicinal Chemistry for Structure-Based Design of Small Molecule Drugs
1.MedSAGE is a novel generative AI framework that addresses long-standing limitations in structure-based small-molecule drug design by operating directly on medicinal chemistry fragments instead of atoms or SMILES strings.
2.Unlike previous diffusion models that often generate chemically unstable or synthetically infeasible molecules, MedSAGE embeds functional groups and ring systems into a smooth latent space—improving interpretability, synthesizability, and design relevance.
3.The model employs a two-phase design: fragment generation via a 3D diffusion model guided by the protein pocket, followed by atomic-level assembly using a custom bond-connection algorithm that enforces chemical rules and optimizes Glide docking scores.
4.On a benchmark of 25 therapeutically relevant targets, MedSAGE-generated molecules had predicted affinities and selectivity statistically indistinguishable from those of approved drugs and clinical candidates—outperforming recent all-atom diffusion methods.
5.MedSAGE shows high selectivity: its ligands consistently reproduced more native binding interactions and avoided excessive hydrophobicity or bulkiness, unlike many molecules from DiffSBDD or IPDiff that lacked pocket specificity.
6.Despite no explicit optimization, MedSAGE molecules adhered to drug-likeness heuristics: Lipinski’s Rule of 5, appropriate logP, low synthetic complexity scores (~3.5), and scaffold diversity (~197 unique scaffolds out of 400 molecules per target).
7.Case studies showed MedSAGE could “rediscover” scaffolds structurally close to known binders like reboxetine, temsavir, and AK1, preserving key pharmacophores and even enhancing binding interactions via new hydrogen bonds.
8.Compared to traditional virtual screening of 30–400 million compound libraries, MedSAGE achieved comparable or better hit quality by generating only 2,000 molecules—offering a 100–1,000× improvement in hit enrichment efficiency.
9.Its fragment-based generation ensures aromatic ring planarity and reduces stereocenter complexity, avoiding common issues in atom-wise models. Fragment embeddings were learned via t-SNE and encode both 3D and chemical properties.
10.Although MedSAGE doesn’t yet handle protein flexibility or apo structures, it performs robustly on holo structure-based tasks—a realistic setting for many early-stage drug discovery projects.
11.The study introduces a scalable pipeline to pair AI-based molecule generation with similarity searches in large commercial libraries (e.g., Enamine REALSpace) to find purchasable analogs with similar binding profiles.
12.MedSAGE provides a compelling proof of concept that generative AI can capture medicinal chemistry principles and generate practical drug-like compounds directly from protein structures, with high interpretability and minimal data.
📜Paper: https://t.co/YW3e7iGH4z
#MedSAGE #DrugDesign #GenerativeAI #DiffusionModels #MedicinalChemistry #ProteinLigand #StructureBasedDesign #MolecularGeneration #FragmentBasedDesign #AI4Science #DeNovoDrugs #VirtualScreening #Bioinformatics #MachineLearning
📢Introducing #AIIndex2025: This year's report highlights the most critical trends in AI – from shifting geopolitical landscape and rapid technological evolution, to AI’s expanding role in science and medicine, business, and public life. Read more: https://t.co/PxHhsKO9dD
BREAKING NEWS
The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”
BREAKING NEWS
The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
Can Gen AI help us evaluate the fairness of AI models?
The answer is YES. Excited to announce 🔁Diffusion Perturbations, a diffusion-based approach to create datasets balanced across demographic traits.
Paper: https://t.co/WTzYTfP7IG
Dataset: https://t.co/jObRoTKVlV
🧵👇! 1/N
By learning a joint representation using deep generative modeling, MultiVI integrates multimodal and single modality single-cell datasets which enhances multiple functionalities. @YosefLab @TalAshuach @MGabitto
OA paper: https://t.co/8tw8EWSD15
We developed a new method for designing proteins which lets us model the entire structure, including sidechains. This involves co-designing protein structure and sequence https://t.co/uVKtXi3QuC
Interested in getting an MS or PhD in Biomedical Informatics? Want to hear firsthand from Stanford faculty members from the Department of Biomedical Data Science about our programs? We have a Zoom panel TOMORROW for those interested in our grad programs. Sign up at link below!
v0.17 of scvi-tools has just been released! In this version, we introduce new (MuData) and improved (Jax) integrations with frameworks we believe will gain widespread adoption in the single-cell community. 1/7
🐻➡️🌲 Excited to announce that I'll start my PhD in Biomedical Informatics at @StanfordBiosci & @StanfordDBDS in September, with an @NSF Graduate Research Fellowship!