Code for our new world model planner is live! https://t.co/PnCk0OzqTl
Includes our implementation on dino-wm, as well as implementations on jepa-wm and le-wm, and minimal pseudocode for anyone to re-implement themselves.
1/ How can we improve our models of the physical world? We develop EddyFormer: integrating spectral methods w/ the Transformer architecture. We can accelerate 3D turbulence simulation by up to 30x compared to top numerical solvers, at the same level of accuracy!
At #NeurIPS2025 on Dec 4 at 4:30 PM, Exhibit Hall C, D, E: #2316
Project page: https://t.co/J0phE7QG0P
2/ EddyFormer can resolve solutions to fluid dynamics problems that other ML models fail to converge on (taken from the Well). It can also generalize to larger spatiotemporal domains than what it was trained on.
This was implied in noting “chasing small improvements / state-of-the-art here is scientifically meaningless.” QM9, etc. were great for the field. ML has changed a lot since then and we should be moving with the field accordingly to new problems and datasets.
At this point the AI for Science community should stop focusing on achieving "state-of-the-art” on datasets like QM9 & MD17: chasing small improvements on these outdated datasets is scientifically meaningless. It's like telling vision researchers to ditch internet-scale and go back to benchmarking on MNIST/CIFAR10
We ran experiments with different model sizes and # of epochs, and then fit a power law curve based on this and extrapolated the lines---so no quadratic fitting was explicitly done, but this happened to show that doing this extrapolation gave a quadratic shape. We can make this more clear, let us know if you have any other questions!
1/ Can molecular AI move past hard-coded Graph Neural Networks and embrace scalable Transformers that discover molecular structure on their own?
We demonstrate that you can train a 1B parameter Transformer model without any graph priors or physical inductive biases.
And surprisingly, not only can it maintain competitive performance under equal compute on the Open Molecules 2025 dataset… it’s faster than a 6M parameter equivariant GNN, and exhibits scaling laws that don’t saturate. We use this as a starting point to investigate emergent internal representations, and find that it adaptively discovers molecular structure!
Check out the interactive demo on our website: https://t.co/fxgmjAHirU
And our paper: https://t.co/diCRh5ywtF
In collaboration with @tobykreiman, @YutongBAI1002, Fadi, Elizabeth, and @EricQuCal.
Here’s a video showing how the Transformer learns distance-aware attention patterns (purple gradient) that adapt to atomic environments 👇
@atAndreasBurger That's right: the round datapoints are estimated from the trends (given compute budgets, we couldn't run all the experiments for each curve)
Thanks for sending, and apologies that we forgot to cite (will add). This work also goes in the category that we discuss in our paper of GNNs that incorporate an attention-based mechanism: it's still operating on a predefined graph, and a new graph is still being constructed for every input using a radius cutoff (and then doing message passing). In this case study, we're using a completely unmodified Transformer with no graph-based features https://t.co/oNa20yMVOM
I value approaches that work by subtraction: stripping away the unnecessary until the essential insight remains. Doing less often demands more: a deeper understanding of data and method to reveal the simplest formulation at the core.
This is an attempt of ours. Hope it resonates.
@erikjbekkers Excited to see it 😃! The beauty of this approach (fully unmodified Transformer) is we can fairly easily scale to 1B+ params, and it's clear that the improvement trend is going to predictably continue to hold. So far, with constraints, it seems one reaches saturation much sooner
@SamMBlau Besides the obvious new capabilities, there are so many fun things that we can explore with OMol25 now: it's exciting to think about all the things we might discover on both the ML and science side (and I think it's just getting started) 😀
6/ Our results demonstrate that many favorable properties of GNNs can emerge adaptively and more flexibly in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling. This has been a hot topic in the community, and we hope that this adds more to the discussion!
5/ We really do mean an unmodified Transformer: no explicit calculation of pairwise distances, no graph-based features, no rotational equivariance, etc. Leveraging modern software and hardware, a 1B parameter Transformer trains and runs inference faster than a 6M parameter equivariant GNN.