Mathieu Dagréou @Mat_Dag - Twitter Profile

Pinned Tweet

about 3 years ago

📣📣 Preprint alert 📣📣 « A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization » w. @tomamoral, @vaiter & @PierreAblin https://t.co/OFbY8JQo6e 1/3

2

43

14

6

11K

Mat_Dag retweeted

Fabian Schaipp @FSchaipp

9 days ago

"It's easier to tune the LR for method A than for B." We tried to formalize this for model-based stochastic optimization methods. We find a key quantity, called stability index, that describes how stable a (weakly) convex bound is as a function of LR. 📚https://t.co/JIrG0gXqXL

FSchaipp's tweet photo. "It's easier to tune the LR for method A than for B."

We tried to formalize this for model-based stochastic optimization methods.

We find a key quantity, called stability index, that describes how stable a (weakly) convex bound is as a function of LR.

📚https://t.co/JIrG0gXqXL https://t.co/9YM5R7b1QN

3

65

9

43

7K

Mat_Dag retweeted

Michael Arbel @MichaelArbel

10 days ago

What do JEPA-style self-distillation dynamics actually learn — and why do they sometimes avoid collapse? In our new work with @BasileTerv987 and Jean Ponce, we tackle this question. What surprised us: These dynamics provably recover representations with nonlinear-CCA structure.

1

93

11

73

35K

Mat_Dag retweeted

Clément Bonet @Clement_Bonet_

about 1 month ago

Our work "Busemann Functions in the Wasserstein Space" was accepted at #AISTATS2026 This is a joint work with Elsa Cazelles, Lucas Drumetz and @nicolas_courty. I will be presenting it tomorrow at the poster 96, see you there! Link: https://t.co/PYz53som3g

Clement_Bonet_'s tweet photo. Our work "Busemann Functions in the Wasserstein Space" was accepted at #AISTATS2026

This is a joint work with Elsa Cazelles, Lucas Drumetz and @nicolas_courty.

I will be presenting it tomorrow at the poster 96, see you there!

Link: https://t.co/PYz53som3g https://t.co/tVVLjOG4gX

1

34

7

2K

Who to follow

Anna Korba

@Korba_Anna

Assistant prof in Machine Learning @Ensaeparis/@CrestUmr

Quentin Bertrand

@Qu3ntinB

Researcher at @Inria, affiliated at @Mila_Quebec. Previously, postdoctoral researcher at @Mila_Quebec w/ @SimonLacosteJ and @gauthier_gidel.

Nidham Gazagnadou

@NGazagnadou

Research Scientist. Interested in optimization, Federated Learning and privacy-preserving ML in general.

Mat_Dag retweeted

Mark Schmidt @MarkSchmidtUBC

8 months ago

This is the way.

1

74

3

13

10K

Mat_Dag retweeted

Konstantin Mishchenko

@konstmish

8 months ago

Nesterov dropped a new paper last week on what functions can be optimized with gradient descent. The idea is simple: we know GD can optimize both nonsmooth (bounded grads) and smooth (Lipschitz grads) functions, but smooth+nonsmooth satisfies neither property, so what can we do?

konstmish's tweet photo. Nesterov dropped a new paper last week on what functions can be optimized with gradient descent.
The idea is simple: we know GD can optimize both nonsmooth (bounded grads) and smooth (Lipschitz grads) functions, but smooth+nonsmooth satisfies neither property, so what can we do? https://t.co/lWhkxBE6m2

11

457

53

305

31K

Mat_Dag retweeted

Fabian Schaipp @FSchaipp

9 months ago

🚟 New blog post: On "infinite" learning-rate schedules and how to construct them from one checkpoint to the next https://t.co/xa1DS9OTTW

1

81

12

55

5K

Mat_Dag retweeted

Rudy Morel @rdMorel

11 months ago

For evolving unknown PDEs, ML models are trained on next-state prediction. But do they actually learn the time dynamics: the "physics"? Check out our poster (W-107) at #ICML2025 this Wed, Jul 16. Our "DISCO" model learns the physics while staying SOTA on next states prediction!

rdMorel's tweet photo. For evolving unknown PDEs, ML models are trained on next-state prediction. But do they actually learn the time dynamics: the "physics"?

Check out our poster (W-107) at #ICML2025 this Wed, Jul 16. Our "DISCO" model learns the physics while staying SOTA on next states prediction! https://t.co/OiNtCZQ2jQ

5

301

50

205

21K

Mat_Dag retweeted

Mathieu Blondel @mblondel_ml

11 months ago

Back from MLSS Senegal 🇸🇳, where I had the honor of giving lectures on differentiable programming. Really grateful for all the amazing people I got to meet 🙏 My slides are here https://t.co/fWH9FJ7ELm

4

67

21

32

5K

Mat_Dag retweeted

Waïss Azizian @wazizian

12 months ago

❓ How long does SGD take to reach the global minimum on non-convex functions? With @FranckIutzeler, J. Malick, P. Mertikopoulos, we tackle this fundamental question in our new ICML 2025 paper: "The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes"

8

486

65

314

35K

Mat_Dag retweeted

Konstantin Mishchenko

@konstmish

12 months ago

I want to address one very common misconception about optimization. I often hear that (approximately) preconditioning with the Hessian diagonal is always a good thing. It's not. In fact, finding a good preconditioner is an open problem, which I think deserves more attention. 1/4

5

203

15

165

20K

Mat_Dag retweeted

Matthieu Terris @MatthieuTerris

12 months ago

🧵 I'll be at CVPR next week presenting our FiRe work 🔥 TL;DR: We go beyond denoising models in PnP with more general restoration (e.g. deblurring) models! A starting point observation is that images are not fixed-points of restoration models:

1

17

5

4

2K

Mat_Dag retweeted

Samuel Vaiter @vaiter

about 1 year ago

📣 New preprint 📣 **Differentiable Generalized Sliced Wasserstein Plans** w/ L. Chapel @rtavenar We propose a Generalized Sliced Wasserstein method that provides an approximated transport plan and which admits a differentiable approximation. https://t.co/81C9BGRtko 1/5

vaiter's tweet photo. 📣 New preprint 📣

**Differentiable Generalized Sliced Wasserstein Plans**

w/
L. Chapel
@rtavenar

We propose a Generalized Sliced Wasserstein method that provides an approximated transport plan and which admits a differentiable approximation.

https://t.co/81C9BGRtko 1/5 https://t.co/Q67goq3kcr

1

30

4

16

3K

Mat_Dag retweeted

Mathurin Massias @mathusmassias

about 1 year ago

It was received quite enthusiastically here so time to share it again!!! Our #ICLR2025 blog post on Flow M atching was published yesterday : https://t.co/2V5BLl6T2p My PhD student Anne Gagneux will present it tomorrow in ICLR, 👉poster session 4, 3 pm, #549 in Hall 3/2B 👈

1

11

5

1

848

Mat_Dag retweeted

Gabriel Peyré

@gabrielpeyre

over 1 year ago

Optimization algorithms come with many flavors depending on the structure of the problem. Smooth vs non-smooth, convex vs non-convex, stochastic vs deterministic, etc. https://t.co/k1KOSFfSUJ

4

510

108

186

21K

Mat_Dag retweeted

Alex Hägele @haeggee

over 1 year ago

A really fun project to work on. Looking at these plots side-by-side still amazes me! How well can **convex optimization theory** match actual LLM runs? My favorite points of our paper on the agreement for LR schedules in theory and practice: 1/n

haeggee's tweet photo. A really fun project to work on. Looking at these plots side-by-side still amazes me! How well can **convex optimization theory** match actual LLM runs?

My favorite points of our paper on the agreement for LR schedules in theory and practice: 1/n https://t.co/ydIXXqomQV

1

43

5

27

5K

Mat_Dag retweeted

Fabian Schaipp @FSchaipp

over 1 year ago

Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 https://t.co/DGHoG1FS3f

5

141

27

90

32K

Mat_Dag retweeted

Konstantin Mishchenko

@konstmish

over 1 year ago

Learning rate schedulers used to be a big mistery. Now you can just take a guarantee for *convex non-smooth* problems (from https://t.co/2RggKkvmxO), and they give you *precisely* what you see in training large models. See this empirical study: https://t.co/kXOOeygaal 1/3

konstmish's tweet photo. Learning rate schedulers used to be a big mistery. Now you can just take a guarantee for *convex non-smooth* problems (from https://t.co/2RggKkvmxO), and they give you *precisely* what you see in training large models.
See this empirical study:
https://t.co/kXOOeygaal
1/3 https://t.co/ZEVfcfXgCr

5

429

71

365

29K

Mat_Dag retweeted

Theo Uscidda @theo_uscidda

over 1 year ago

Our work on geometric disentangled representation learning has been accepted to ICLR 2025! 🎊See you in Singapore if you want to understand this gif better :)

0

152

19

53

14K

Mat_Dag retweeted

Gabriel Peyré

@gabrielpeyre

over 1 year ago

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. https://t.co/FdxBkdLYrw