Felix Dangel @f_dangel - Twitter Profile

f_dangel retweeted

Weight Space Symmetries @ ICML 2026 @weightsymmetry

2 months ago

📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026

weightsymmetry's tweet photo. 📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them.

Submission Deadline: April 24 (23:59 AoE)
#ICML2026 https://t.co/xvtpRnlR7u

3

56

37

15

22K

f_dangel retweeted

Runa Eschenhagen @runame_

4 months ago

1/14 Is Muon “better” than Shampoo? We argue that their relationship parallels Adam's relationship with Signum. Analogous to @lukas_balles and Hennig’s (2018) decomposition of Adam into element-wise scaled Signum, we can decompose Shampoo as left- and right-adapted Muon.

runame_'s tweet photo. 1/14 Is Muon “better” than Shampoo?

We argue that their relationship parallels Adam's relationship with Signum. Analogous to @lukas_balles and Hennig’s (2018) decomposition of Adam into element-wise scaled Signum, we can decompose Shampoo as left- and right-adapted Muon. https://t.co/XoaDFainkd

3

265

45

261

33K

Felix Dangel @f_dangel

6 months ago

We found a simple trick to accelerate the computation of PDE operators like the Laplacian via Taylor mode autodiff. Poster #3401, today @NeurIPS2025's evening session in San Diego. 📜 Paper: https://t.co/pEO6vp4oj1 🧪 Code: https://t.co/AcV4OkGVK0

f_dangel's tweet photo. We found a simple trick to accelerate the computation of PDE operators like the Laplacian via Taylor mode autodiff.

Poster #3401, today @NeurIPS2025's evening session in San Diego.

📜 Paper: https://t.co/pEO6vp4oj1
🧪 Code: https://t.co/AcV4OkGVK0 https://t.co/Er14W4CPSQ

0

7

1

223

Felix Dangel @f_dangel

6 months ago

Want to learn how to train PINNs faster? Come to our @NeurIPS2025 poster (#2209) today in San Diego (second session)! 📜 Paper: https://t.co/9ASvOdZ7e1 🧪 Code: https://t.co/nNdajTsRUP Led by @AndresGuzco.

f_dangel's tweet photo. Want to learn how to train PINNs faster?

Come to our @NeurIPS2025 poster (#2209) today in San Diego (second session)!

📜 Paper: https://t.co/9ASvOdZ7e1
🧪 Code: https://t.co/nNdajTsRUP

Led by @AndresGuzco. https://t.co/3gmrRhHuRt

0

8

1

403

Who to follow

Runa Eschenhagen

@runame_

PhD student in machine learning @CambridgeMLG.

Peter Bloem (@[email protected])

@pbloemesquire

Machine learning assistant prof. Vrije Universiteit Amsterdam.

Aditya Gulati

@adiGulati_

PhD Student at ELLIS Alicante

f_dangel retweeted

Wu Lin @LinYorker

6 months ago

Within an information-geometric framework, we reconnect Shampoo/SOAP with both classical quasi-Newton ideas and Gaussian whitening, and develop practical methods that naturally handle tensor-valued weights in language model pre-training. https://t.co/PJ4AVxPgRC opt-ml workshop

LinYorker's tweet photo. Within an information-geometric framework, we reconnect Shampoo/SOAP with both classical quasi-Newton ideas and Gaussian whitening, and develop practical methods that naturally handle tensor-valued weights in language model pre-training. https://t.co/PJ4AVxPgRC opt-ml workshop https://t.co/Q1nD1saabN

1

8

7

3

1K

Felix Dangel @f_dangel

7 months ago

🚀 [NeurIPS 2025] jet-for-pytorch (https://t.co/AcV4OkGnUs) is live! From our paper "Collapsing Taylor Mode AD": 🔹 Implements Taylor mode for PyTorch 🔹 Adds collapsing → speedup and memory reduction for PDE operators like the Laplacian Talk to me #NeurIPS or Tim #EurIPS!

0

5

2

1

428

Felix Dangel @f_dangel

7 months ago

🎓 Looking for MSc or PhD opportunities in Machine Learning for Fall 2026? Join my group at @Concordia and @Mila_Quebec! 🔍 Focus: autodiff, second-order optimization, and Hessian-based methods for LLMs & scientific ML. 📅 Apply by Dec 1: https://t.co/qJ3AcgmpUQ

1

43

16

7

9K

f_dangel retweeted

Bruno Mlodozeniec

@brunorganised

8 months ago

I would highly recommend using this library for any research on influence functions. Implementing scalable IFs (usually ≡ K-FAC) is a massive pain, especially for modern architectures. With curvlinops, getting plots like the below for diffusion models is relatively easy

brunorganised's tweet photo. I would highly recommend using this library for any research on influence functions.

Implementing scalable IFs (usually ≡ K-FAC) is a massive pain, especially for modern architectures. With curvlinops, getting plots like the below for diffusion models is relatively easy https://t.co/5EkRN6oF9F

1

7

3

6

784

f_dangel retweeted

Runa Eschenhagen @runame_

8 months ago

1/6 Hessian approximations are ubiquitous in deep learning, but working with them can get quite involved. We argue for using a linear operator interface for neural network curvature matrices and implement this in PyTorch in our library curvlinops. https://t.co/cX31ApZRF0

runame_'s tweet photo. 1/6 Hessian approximations are ubiquitous in deep learning, but working with them can get quite involved.

We argue for using a linear operator interface for neural network curvature matrices and implement this in PyTorch in our library curvlinops.

https://t.co/cX31ApZRF0 https://t.co/W7vi79xKAd

4

214

29

175

16K

Felix Dangel @f_dangel

10 months ago

KFAC is everywhere—from optimization to influence functions. While the intuition is simple, implementation is tricky. We (@BalintMucsanyi, @2bys2 ,@runame_) wrote a ground-up intro with code to help you get it right. 📖 https://t.co/sIQfB1bmsE 💻 https://t.co/l6quq7cuT2

0

39

9

20

2K

f_dangel retweeted

Weronika Ormaniec @wormaniec

about 1 year ago

Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique? With @unregularized & @f_dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper! Key insights👇 1/8

wormaniec's tweet photo. Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique?

With @unregularized & @f_dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper!

Key insights👇 1/8 https://t.co/0Hb06w9Kfb

1

26

8

10

3K

f_dangel retweeted

Wu Lin @LinYorker

almost 2 years ago

#ICML2024 Can We Remove the Square-Root in Adaptive Methods? https://t.co/hD604GmB0N Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW) Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1

LinYorker's tweet photo. #ICML2024
Can We Remove the Square-Root in Adaptive Methods?
https://t.co/hD604GmB0N
Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW)

Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1 https://t.co/n8xICjTz3t

9

61

16

33

13K

f_dangel retweeted

Wu Lin @LinYorker

over 2 years ago

For the first time, we (with @f_dangel, @runame_, @k_neklyudov @akristiadi7, Richard E. Turner, @AliMakhzani) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW. also @NeurIPS workshop on Opt for ML https://t.co/ww5zxVrOPX /1

LinYorker's tweet photo. For the first time, we (with @f_dangel, @runame_, @k_neklyudov @akristiadi7, Richard E. Turner, @AliMakhzani) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW. also @NeurIPS workshop on Opt for ML https://t.co/ww5zxVrOPX /1 https://t.co/VVkvaAt7Fv

2

47

8

18

12K

f_dangel retweeted

Agustinus Kristiadi @akristiadi7

over 2 years ago

The consensus in deep learning is that many quantities are not invariant under reparametrization. Our #NeurIPS2023 paper shows that they actually are if the implicitly assumed Riemannian metric is taken into account 🧵 https://t.co/Udi4U77Qxf w/ @f_dangel and @PhilippHennig5

akristiadi7's tweet photo. The consensus in deep learning is that many quantities are not invariant under reparametrization. Our #NeurIPS2023 paper shows that they actually are if the implicitly assumed Riemannian metric is taken into account 🧵

https://t.co/Udi4U77Qxf

w/ @f_dangel and @PhilippHennig5 https://t.co/NAjFtRV661

2

98

18

55

15K

Felix Dangel @f_dangel

over 3 years ago

@kaitlinmaile @nikosbosse https://t.co/NUFtsAKHy6

0

2

0

Felix Dangel @f_dangel

over 4 years ago

Which plane would you board? [#NeurIPS2021] Cockpit: Practical trouble-shooting of DNN training. Empowered by recent advances in autodiff. In collaboration with @frankstefansch1 & @PhilippHennig5.

f_dangel's tweet photo. Which plane would you board?

[#NeurIPS2021] Cockpit: Practical trouble-shooting of DNN training.
Empowered by recent advances in autodiff.

In collaboration with @frankstefansch1 & @PhilippHennig5. https://t.co/8HBCnLnhdu

Frank Schneider @frankstefansch1

over 4 years ago

📣#NeurIPS2021📄 Why are we still debugging neural nets by staring at loss curves? We present Cockpit, a visual debugger for deep learning. Joint work with @f_dangel & @PhilippHennig5 Paper: https://t.co/FANykwckRK Code: https://t.co/4ShSGgFHsq Video: https://t.co/rmsYaFMMeH 🧵

frankstefansch1's tweet photo. 📣#NeurIPS2021📄
Why are we still debugging neural nets by staring at loss curves?
We present Cockpit, a visual debugger for deep learning.
Joint work with @f_dangel & @PhilippHennig5
Paper: https://t.co/FANykwckRK
Code: https://t.co/4ShSGgFHsq
Video: https://t.co/rmsYaFMMeH
🧵 https://t.co/IhgEUofigw

2

166

27

50

0

14

1

2

0

f_dangel retweeted

Alexander Immer @a1mmer

over 4 years ago

In our #NeurIPS2021 paper (https://t.co/AJndowhlbn), we introduce laplace-torch for effortless Bayesian deep learning. Despite their simplicity, we find that Laplace approximations are surprisingly competitive with more popular approaches. https://t.co/1bMuPzQNDA

7

403

90

165

0

Felix Dangel @f_dangel

over 4 years ago

I'm excited to announce basic support for ResNets & RNNs in BackPACK 1.4 for @PyTorch! 🎉 Find out more in the tutorials: 📈 https://t.co/xal9UM5MCW 📈 https://t.co/7G25HX0k8i Thanks to Tim Schäfer for his work on the library in the past months 🙏.

0

30

5

0

Felix Dangel

@f_dangel

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users