📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them.
Submission Deadline: April 24 (23:59 AoE)
#ICML2026
1/14 Is Muon “better” than Shampoo?
We argue that their relationship parallels Adam's relationship with Signum. Analogous to @lukas_balles and Hennig’s (2018) decomposition of Adam into element-wise scaled Signum, we can decompose Shampoo as left- and right-adapted Muon.
We found a simple trick to accelerate the computation of PDE operators like the Laplacian via Taylor mode autodiff.
Poster #3401, today @NeurIPS2025's evening session in San Diego.
📜 Paper: https://t.co/pEO6vp4oj1
🧪 Code: https://t.co/AcV4OkGVK0
Want to learn how to train PINNs faster?
Come to our @NeurIPS2025 poster (#2209) today in San Diego (second session)!
📜 Paper: https://t.co/9ASvOdZ7e1
🧪 Code: https://t.co/nNdajTsRUP
Led by @AndresGuzco.
Within an information-geometric framework, we reconnect Shampoo/SOAP with both classical quasi-Newton ideas and Gaussian whitening, and develop practical methods that naturally handle tensor-valued weights in language model pre-training. https://t.co/PJ4AVxPgRC opt-ml workshop
🚀 [NeurIPS 2025] jet-for-pytorch (https://t.co/AcV4OkGnUs) is live!
From our paper "Collapsing Taylor Mode AD":
🔹 Implements Taylor mode for PyTorch
🔹 Adds collapsing → speedup and memory reduction for PDE operators like the Laplacian
Talk to me #NeurIPS or Tim #EurIPS!
🎓 Looking for MSc or PhD opportunities in Machine Learning for Fall 2026?
Join my group at @Concordia and @Mila_Quebec!
🔍 Focus: autodiff, second-order optimization, and Hessian-based methods for LLMs & scientific ML.
📅 Apply by Dec 1: https://t.co/qJ3AcgmpUQ
I would highly recommend using this library for any research on influence functions.
Implementing scalable IFs (usually ≡ K-FAC) is a massive pain, especially for modern architectures. With curvlinops, getting plots like the below for diffusion models is relatively easy
1/6 Hessian approximations are ubiquitous in deep learning, but working with them can get quite involved.
We argue for using a linear operator interface for neural network curvature matrices and implement this in PyTorch in our library curvlinops.
https://t.co/cX31ApZRF0
KFAC is everywhere—from optimization to influence functions. While the intuition is simple, implementation is tricky.
We (@BalintMucsanyi, @2bys2 ,@runame_) wrote a ground-up intro with code to help you get it right.
📖 https://t.co/sIQfB1bmsE
💻 https://t.co/l6quq7cuT2
Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique?
With @unregularized & @f_dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper!
Key insights👇 1/8
#ICML2024
Can We Remove the Square-Root in Adaptive Methods?
https://t.co/hD604GmB0N
Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW)
Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1
For the first time, we (with @f_dangel, @runame_, @k_neklyudov@akristiadi7, Richard E. Turner, @AliMakhzani) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW. also @NeurIPS workshop on Opt for ML https://t.co/ww5zxVrOPX /1
The consensus in deep learning is that many quantities are not invariant under reparametrization. Our #NeurIPS2023 paper shows that they actually are if the implicitly assumed Riemannian metric is taken into account 🧵
https://t.co/Udi4U77Qxf
w/ @f_dangel and @PhilippHennig5
Which plane would you board?
[#NeurIPS2021] Cockpit: Practical trouble-shooting of DNN training.
Empowered by recent advances in autodiff.
In collaboration with @frankstefansch1 & @PhilippHennig5.
📣#NeurIPS2021📄
Why are we still debugging neural nets by staring at loss curves?
We present Cockpit, a visual debugger for deep learning.
Joint work with @f_dangel & @PhilippHennig5
Paper: https://t.co/FANykwckRK
Code: https://t.co/4ShSGgFHsq
Video: https://t.co/rmsYaFMMeH
🧵
In our #NeurIPS2021 paper (https://t.co/AJndowhlbn), we introduce laplace-torch for effortless Bayesian deep learning. Despite their simplicity, we find that Laplace approximations are surprisingly competitive with more popular approaches. https://t.co/1bMuPzQNDA
I'm excited to announce basic support for ResNets & RNNs in BackPACK 1.4 for @PyTorch! 🎉
Find out more in the tutorials:
📈 https://t.co/xal9UM5MCW
📈 https://t.co/7G25HX0k8i
Thanks to Tim Schäfer for his work on the library in the past months 🙏.