A Benchmark for Quantum Chemistry Relaxations via Machine Learning Interatomic Potentials
1.PubChemQCR is the largest publicly available dataset of DFT-based molecular relaxation trajectories, with 3.5 million molecules and over 300 million conformations, including 105 million computed with DFT. Each conformation includes total energy and atomic force labels.
2.The dataset captures full geometry optimization trajectories, not just final structures—addressing a key gap in previous datasets. This enables machine learning interatomic potentials (MLIPs) to learn from both stable and non-equilibrium geometries.
3.PubChemQCR offers broad chemical diversity, spanning 25 elements and a wide range of molecular sizes and conformational complexities. It was built from PubChemQC’s raw optimization outputs, spanning PM3, Hartree–Fock, and DFT stages.
4.Compared to existing datasets like QM9, GEOM, or ANI-1x, PubChemQCR provides significantly more conformational data, better element coverage, and crucial force labels at high-accuracy DFT level—making it uniquely suited for training MLIPs.
5.A curated subset, PubChemQCR-S, contains ~41K DFT relaxation trajectories for efficient model benchmarking. This subset supports rapid prototyping, ablation studies, and hyperparameter tuning.
6.The authors benchmarked 9 MLIP models (SchNet, PaiNN, NequIP, FAENet, Equiformer, etc.) on energy and force prediction tasks using PubChemQCR-S. Equiformer achieved the best overall performance on both energy and force metrics.
7.In geometry optimization tasks, Equiformer outperformed all other models, achieving 70.15% average energy minimization, 23.81% chemical accuracy success rate, and a 19.85% force convergence rate. Most other models struggled, especially with force convergence.
8.The dataset supports supervised pretraining of 3D molecular models with physically grounded energy and force labels—potentially benefiting downstream property prediction tasks in drug discovery and materials science.
9.It also enables training of generative models for 3D molecular structures. These models can learn to generate low-energy conformations directly from the data, bypassing costly DFT optimization.
10.Limitations include the dataset's near-equilibrium bias (due to DFT relaxation) and inconsistent label quality across optimization stages. Also, chemical element coverage is capped at 25 due to DFT method constraints.
11.Despite limitations, PubChemQCR is a foundational resource for building accurate, transferable, and data-efficient MLIPs. It can accelerate atomistic simulations, geometry optimization, and generative modeling in quantum chemistry.
💻Code: https://t.co/8P56b6Yh7h
📜Paper: https://t.co/SQTZYZnugP
#QuantumChemistry #ML4Science #DFT #GraphNeuralNetworks #MolecularSimulation #MachineLearning #OpenScience #MolecularModeling
The science version of “what doesn’t kill me makes me stronger”: Scientists with ‘near misses’ early in their careers outperform those with ‘narrow wins’ in the longer run.
https://t.co/9m4GVT197I
Today is #ColourBlindAwarenessDay!
For your next paper/presentation figures, consider using colour-blind friendly palettes.
PyMOL: https://t.co/oKgZC8OTzk
Matplotlib: https://t.co/Aj8Y4HTJpR
@DovydasJoksas blog with further background: https://t.co/Dw1xtHoF2r
Great mentors expand your capacity while also sharpening your focus.
They help you to see all of the things that you’re capable of doing while also helping you to choose which of those things are worth pursuing.
Might as well break the news to my eight Twitter followers: I'll be starting my group at the @univgroningen mid '22. Any students interested in a PhD in (interdisciplinary) computational/experimental protein engineering just DM me for details. Two positions available!
If Nature does not accept our paper, we will get a mob together, storm their office, and take over the place.
Because that's how we do things in the Land of the Free!
#sciencetwitter@nature
🖥European #Funding Opportunities for #Researchers - 10 Websites you need to know.
Finding the right grants eats up a huge chunk of every researcher's life, so we gathered 10 websites dedicated to European funding for you.
Find them here 👉 https://t.co/uKtQlg8ijy