Pierre Foret @Foret_p - Twitter Profile

Pinned Tweet

over 5 years ago

Introducing SAM: An easy-to-use algorithm derived by connecting PAC Bayesian bounds and geometry of the loss landscape. Achieves SOTA on benchmark image tasks (0.3% error on CIFAR10, 3.9% on CIFAR100) and drastically improves label noise robustness. https://t.co/aONWVTPZsT

6

151

38

43

0

Foret_p retweeted

Maksym Andriushchenko

@maksym_andr

almost 4 years ago

Excited to share our #ICML2022 paper "Towards Understanding Sharpness-Aware Minimization"! Why does m-sharpness matter in m-SAM? Can we explain the benefits of m-SAM on simple models? Which other interesting properties does m-SAM show? Paper: https://t.co/F63P1KkdCw 🧵1/n

maksym_andr's tweet photo. Excited to share our #ICML2022 paper "Towards Understanding Sharpness-Aware Minimization"!

Why does m-sharpness matter in m-SAM? Can we explain the benefits of m-SAM on simple models? Which other interesting properties does m-SAM show?

Paper: https://t.co/F63P1KkdCw
🧵1/n https://t.co/PnAke2GCnW

4

195

31

43

0

Foret_p retweeted

UCL CSML @uclcsml

over 4 years ago

Excited to host @TheGradient to talk about the current state of Sharpness-Aware Minimization (SAM) and future directions next Friday 25th Feb 5pm (GMT time) Zoom details: https://t.co/7j5lND15Yr

0

21

7

3

0

Foret_p retweeted

Hossein Mobahi @TheGradient

over 4 years ago

Are you a strong PhD student interested in doing cutting edge research at @GoogleAI? I have an opening for student researcher position to explore open problems and extensions of Sharpness-Aware Minimization (SAM) w/ @bneyshabur. Please refer to https://t.co/oHdwahJ8uH.

TheGradient's tweet photo. Are you a strong PhD student interested in doing cutting edge research at @GoogleAI? I have an opening for student researcher position to explore open problems and extensions of Sharpness-Aware Minimization (SAM) w/ @bneyshabur. Please refer to https://t.co/oHdwahJ8uH. https://t.co/obFnAZJi4V

4

118

22

35

0

Who to follow

Pavel Izmailov

@Pavel_Izmailov

Researcher @AnthropicAI 🤖 Assistant Professor @nyuniversity 🏙️ Previously @OpenAI #StopWar 🇺🇦

Greg Yang

@TheGregYang

xai cofounder. fighting lyme

Gergely Neu

@neu_rips

ML theory nerd & AI non-enthusiast

Pierre Foret @Foret_p

over 4 years ago

@_arohan_ @TheGradient 1024 samples on 64 replicas seems ideal. Is the perturbation scaled by anything ?

0

Pierre Foret @Foret_p

over 4 years ago

@_arohan_ @TheGradient Indeed, not syncing the perturbations is pretty critical to SAM's success (see the section about M-sharpness in the paper)

0

1

0

Foret_p retweeted

Aran Komatsuzaki

@arankomatsuzaki

over 4 years ago

Sharpness-Aware Minimization Improves Language Model Generalization SAM substantially improves performance on SuperGLUE, GLUE, Web Questions, Natural Questions, Trivia QA, and TyDiQA by encourageing convergence to flatter minima w/ minimal overhead. https://t.co/iQuFEq2Ne3

arankomatsuzaki's tweet photo. Sharpness-Aware Minimization Improves Language Model Generalization

SAM substantially improves performance on SuperGLUE, GLUE, Web Questions, Natural Questions, Trivia QA, and TyDiQA by encourageing convergence to flatter minima w/ minimal overhead.

https://t.co/iQuFEq2Ne3 https://t.co/Syp7QldCGw

1

88

16

0

Pierre Foret @Foret_p

over 4 years ago

@thanhnguyentang @matthen2 If each particle is independent, each particle probably only need to keep the random seed used to generate the path increments

1

3

0

1

0

Foret_p retweeted

AK

@_akhaliq

about 5 years ago

When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations pdf: https://t.co/GYknaVoNAM abs: https://t.co/kaUxIdMVNQ +5.3% and +11.0% top-1 accuracy on ImageNet for ViT-B/16 and MixerB/16, with the simple Inception-style preprocessing

_akhaliq's tweet photo. When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations
pdf: https://t.co/GYknaVoNAM
abs: https://t.co/kaUxIdMVNQ

+5.3% and +11.0% top-1 accuracy on ImageNet for ViT-B/16 and MixerB/16, with the simple Inception-style preprocessing https://t.co/EI1ZSUccUn

7

472

124

96

0

Foret_p retweeted

Olivier Grisel @ogrisel

about 5 years ago

Interesting empirical study of the geometry of the loss landscape of Vision Transformers and MLP-Mixers and study of the critical impact of Sharpness Aware Minimization (SAM) for those architectures.

0

27

6

3

0

Foret_p retweeted

Hossein Mobahi @TheGradient

about 5 years ago

Excited to see Sharpness-Aware Minimization (SAM optimizer) we have proposed recently (w/ @Foret_p @bneyshabur and Kleiner) is becoming a persistent component in recent state-of-the-art records 😇

0

39

7

6

0

Foret_p retweeted

Behnam Neyshabur

@bneyshabur

about 5 years ago

Sharpness-Aware Minimization for Efficiently Improving Generalization (Spotlight at #ICLR2021 ) with @Foret_p, Ariel Kleiber and @TheGradient Paper: https://t.co/WyIwSttkJp Code: https://t.co/9SBwZ1BtiU Video and Poster: https://t.co/PoIAazaIdN 3/7

bneyshabur's tweet photo. Sharpness-Aware Minimization for Efficiently Improving Generalization (Spotlight at #ICLR2021 )
with @Foret_p, Ariel Kleiber and @TheGradient

Paper: https://t.co/WyIwSttkJp
Code: https://t.co/9SBwZ1BtiU
Video and Poster: https://t.co/PoIAazaIdN

3/7 https://t.co/z2wZPerLR5

2

19

4

1

0

Foret_p retweeted

KDnuggets

@kdnuggets

about 5 years ago

We don’t need to worry about #Overfitting anymore? Sharpness-Aware Minimization, seeks parameters that lie in neighborhoods having uniformly low loss; results in a min-max optimization formulation with efficient gradient descent #MachineLearning https://t.co/a4SVXhEumo

kdnuggets's tweet photo. We don’t need to worry about #Overfitting anymore? Sharpness-Aware Minimization, seeks parameters that lie in neighborhoods having uniformly low loss; results in a min-max optimization formulation with efficient gradient descent #MachineLearning https://t.co/a4SVXhEumo https://t.co/hYyg0GiKSr

0

21

14

6

0

Pierre Foret @Foret_p

over 5 years ago

@RisingSayak Great stuff! Is this syncing epsilon across replicas ? On a TPU (8 chips for this one I think?) I would expect the benefits of SAM to be amplified by not syncing epsilon accross the devices (one perturbation per sub-batch). Could be a cool improvement if it's not already the case

0

Pierre Foret @Foret_p

over 5 years ago

@imos You can of course emulate this on a single device with data accumulation, but it becomes tedious and the wall clock time might suffer (although NFNet using a subset of the batch to compute the SAM epsilon is a great trick)

1

0

Pierre Foret @Foret_p

over 5 years ago

@imos so SAM on TPU minimize m-sharpness for a small m, which leads to the biggest boosts. That's why I assume we will mostly see SAM applied to larger nets that require TPU or multiple GPU, where it really shines. 3/3

0

Pierre Foret @Foret_p

over 5 years ago

@imos SAM usually works well for smaller models, but the best results are obtained when using a lot of data parallelism (see section about M-sharpness in the SAM paper). Because the largest nets are trained on a lot of tpu chips, each chip computes epsilon for few samples... 2/3

1

0

Foret_p retweeted

Andy Brock @ajmooch

over 5 years ago

Pretrained NFNet model weights (F0-F5, F6+SAM) are now available at https://t.co/uNSzeA4uJt, along with a demo Colab! All models are pretrained on ImageNet.

3

150

33

17

0

Foret_p retweeted

AK

@_akhaliq

over 5 years ago

High-Performance Large-Scale Image Recognition Without Normalization pdf: https://t.co/THe2NfRI1K abs: https://t.co/Z68FevANZP github: https://t.co/Gvw5s5HZIh

_akhaliq's tweet photo. High-Performance Large-Scale Image Recognition Without Normalization
pdf: https://t.co/THe2NfRI1K
abs: https://t.co/Z68FevANZP
github: https://t.co/Gvw5s5HZIh https://t.co/PGrLhn5oyl

1

240

64

48

0

Pierre Foret

@Foret_p

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users