📢 Introducing FragAtlas-62M: a foundation chemical language model trained on the complete ZINC-22 fragment subset. It achieves 99.9% validity, 53.6% coverage of known fragments, and generates ~22% novel structures—advancing fragment-based drug discovery. https://t.co/n6P1Mq2eoa
The Research Grant application portal is now open!
The Welch Foundation supports basic chemical research at Texas educational institutions and has committed over $900M in total research grants to date.
🗓️Deadline: January 31, 2026.
🔗Learn more: https://t.co/w0TYEeJ3yQ
Exciting news! 🎉 The #WelchFoundation has named Dr. Sheel Dodani the 2026 #HackermanAwardee for her groundbreaking protein-based biosensors that light up inorganic anions in living cells. A major leap forward for chemistry and human health. See more: https://t.co/MJdiDfW3lD
How to prepare a scientific poster
We asked researchers in a range of disciplines and career stages to share their tips for making the most of presenting a poster at a conference, including any adjustments they’ve made for conferences held online. 👉 https://t.co/E41TzR6ICu
Take Two – our second foray into machine learning. We introduce a triplet-loss few-shot framework for cryoEM micrograph classification, enabling rapid and adaptable quality assessment. Kudos to Alex and Brandon for a job well done! https://t.co/pq4vpVObX5
A Foundation Chemical Language Model for Comprehensive Fragment-Based Drug Discovery
1. FragAtlas-62M is a groundbreaking chemical language model specifically designed for fragment-based drug discovery. It is trained on the largest fragment dataset to date, comprising over 62 million molecules from the ZINC-22 database. This model achieves an impressive 99.90% chemical validity in generated fragments, making it a powerful tool for medicinal chemistry.
2. The model not only maintains a high coverage of known ZINC fragments (53.55%) but also generates 22.04% novel structures with practical relevance. This balance between rediscovery and novelty is crucial for fragment-based drug discovery, as it ensures both the reliability of known fragments and the potential for new discoveries.
3. FragAtlas-62M is built on a GPT-2 architecture with 42.7M parameters. It uses a 128-token context window and is trained using HuggingFace Transformers. The model's architecture and training methodology are optimized for fragment-level SMILES modeling, ensuring efficient and high-throughput generation capabilities.
4. The model's performance is validated across 12 molecular descriptors and three fingerprint methods. The generated fragments closely match the training distribution, with all effect sizes being less than 0.4. This indicates that the model generalizes well and maintains the key properties of the training set without systematic bias.
5. FragAtlas-62M demonstrates substantial overlap in chemotype space between novel and rediscovered molecules, as shown by t-SNE visualizations and distance analysis. The distance ratios (NR/NN and NR/RR) are consistently near 1.0 across all three fingerprint types, indicating minimal distributional shifts.
6. The model is released with training code, preprocessed data, documentation, and model weights, making it accessible for further research and practical applications. This open release lowers the barrier to entry for groups with modest computational resources and encourages rapid follow-up work and experimental validation.
7. While FragAtlas-62M is a significant advancement, it has limitations. It does not explicitly model stereochemistry, geometric relationships, or fragment-to-fragment connectivity rules. Future work should focus on integrating conditional controls, structural information, and methods for automated molecule construction to broaden its practical applications.
📜Paper: https://t.co/psbsES6aCP
#FragAtlas62M #ChemicalLanguageModel #FragmentBasedDrugDiscovery #MedicinalChemistry #AIinDrugDiscovery #OpenSource #Research
FragAtlas-62M is the brainchild of Alex Ho and marks our first foray into machine learning and chemical language models for fragment-based drug discovery. Congrats Alex!
The Welch Foundation congratulates Jennifer A. Doudna for being named the recipient of the 2026 Priestly Medal! As a former member of The Foundation’s Scientific Advisory Board, Dr. Doudna contributed greatly to our mission of advancing basic chemical research. https://t.co/QThTvpTDcH
"New Horizons in Drug Discovery,” will feature speakers who have made foundational discoveries of new medicines, new drug discovery technology, and strategies for leveraging therapeutic modalities to treat diseases. Register here: https://t.co/ccSpYPzgI4
https://t.co/Fzzt32vU0O
Excited to share our latest paper uncovering a mitochondrial-to-endoplasmic reticulum stress response (MERSR)! We demonstrate how mitochondrial stress signals the ER to protect cellular proteostasis.
In memoriam | Daniel N. Hebert's (1962–2024) colleagues remember the passionate glycobiologist, scientist, caring mentor and kind friend.
https://t.co/wlIRymy5pc
Researchers at @UChicago, @Argonne, and @Harvard perfected a new technique for creating experimental movies of proteins in action, showing that EFX can be a powerful new tool for quickly visualizing and understanding protein dynamics.
https://t.co/utREQP76zD