Kwangjun Ahn @KwangjunA - Twitter Profile

6 months ago

@JohnCLangford Updates on the Dion codebase (https://t.co/jVz7Fxv6B1), please check them out! - Dion2 (https://t.co/VtAoq2eH01), which has much simpler math than Dion. - NorMuon (https://t.co/ZHaNLWQO2y) thanks to @li_zichong.

0

33

2

20

2K

KwangjunA retweeted

Seunghyun Seo @SeunghyunSEO7

7 months ago

sacled up to 12.7B dense, 5.5T tokens. - polynorm (optimized kernel) - grouped diff attn (their work) - parallel muonclip (adopt alltoall like mainhorse, essential, dion) - 80M batch it's still non-reasoning, also not moe though... keep pushing guys! https://t.co/XheiAS8TBO

3

124

18

84

17K

Kwangjun Ahn @KwangjunA

9 months ago

@ionutmodo Thanks! Looks great, let me read through it

0

1

0

44

Kwangjun Ahn @KwangjunA

9 months ago

New improvement in Dion leads to a speedup that makes orthonormal updates (eg. Muon) more scalable for larger matrices. The trick: carefully using Newton-Schulz (on smaller matrices) as Dion's backend. Updates to our microsoft/dion codebase are coming soon---stay tuned!

KwangjunA's tweet photo. New improvement in Dion leads to a speedup that makes orthonormal updates (eg. Muon) more scalable for larger matrices. The trick: carefully using Newton-Schulz (on smaller matrices) as Dion's backend. Updates to our microsoft/dion codebase are coming soon---stay tuned! https://t.co/ZC4I8g7xU2

1

27

4

15

2K

Who to follow

Andrej Risteski

@risteski_a

Machine learning researcher. Associate Professor, ML department at CMU (@mldcmu).

Spencer Frei

@sfrei_

Research Scientist at @GoogleDeepMind

Jeremy Bernstein

@jxbz

🧪 @thinkymachines ✍️ anon feedback @ https://t.co/RIhBhjMRdD

KwangjunA retweeted

Microsoft Research

@MSFTResearch

9 months ago

Join us on Sept 24 at 8 AM PT for Microsoft Research Forum Season 2 – a virtual series highlighting purposeful research and its real-world impact, from fundamental exploration to advancing AI responsibly, scaling innovation through products and open source, and driving positive change for society. Register now: https://t.co/eWh5h1NZ7N

1

25

3

6K

KwangjunA retweeted

Andrej Karpathy

@karpathy

11 months ago

@jxbz love the repo! clean code, good practices but still not overly over-engineered, triton kernels, well documented, simple reference implementations alongside optimized code. nice

2

203

5

59

31K

KwangjunA retweeted

elie

@eliebakouch

11 months ago

Lot, lot of alpha here

3

169

11

137

24K

Kwangjun Ahn @KwangjunA

11 months ago

@eliebakouch @jxbz @JohnCLangford @GagMagakyan Thank you!! Glad to hear it

0

2

0

77

Kwangjun Ahn @KwangjunA

11 months ago

@jxbz @JohnCLangford @GagMagakyan Thank you for the kind words! Hope this proves useful to the community!

1

4

0

403

KwangjunA retweeted

Jeremy Bernstein @jxbz

11 months ago

Looks like extremely exciting and useful work by @KwangjunA, Byron Xu, Natalie Abreu, @JohnCLangford and @GagMagakyan https://t.co/8WzCbdljDS (2/2)

4

139

15

105

10K

KwangjunA retweeted

Jeremy Bernstein @jxbz

11 months ago

I had wondered why there was no official Dion implementation by the authors... I guess now we know. This repository looks dope: FSDP Muon and Dion implementations, triton kernels for Newton-Schulz, and lots of practical advice (1/2)

7

332

18

210

75K

KwangjunA retweeted

Laker Newhouse @LakerNewhouse

11 months ago

[1/6] Curious about Muon, but not sure where to start? I wrote a 3-part blog series called “Understanding Muon” designed to get you up to speed—with The Matrix references, annotated source code, and thoughts on where Muon might be going.

LakerNewhouse's tweet photo. [1/6] Curious about Muon, but not sure where to start? I wrote a 3-part blog series called “Understanding Muon” designed to get you up to speed—with The Matrix references, annotated source code, and thoughts on where Muon might be going.

7

340

42

441

36K

KwangjunA retweeted

John Langford @JohnCLangford

11 months ago

Apparently Dion is now being worked on for Torch Titan: https://t.co/QeuRFDTyan :-)

0

101

8

46

21K

Kwangjun Ahn @KwangjunA

11 months ago

@MParakhin Thanks for advertising Dion! :)

1

5

0

2K

KwangjunA retweeted

Mikhail Parakhin

@MParakhin

11 months ago

Since nobody asked :-), here is my list of papers not to be missed from ICML: 1) Dion: distributed orthonormalized updates (well, technically not at ICML, but everyone's talking about it). 2) MARS: Unleashing the Power of Variance Reduction for Training Large Models 3) ...

6

425

31

444

69K

Kwangjun Ahn @KwangjunA

11 months ago

@orvieto_antonio @micahgoldblum @teodorasrec @jonasgeiping Nice results! One question: wouldn’t large (global-)batch size be more practical for distributed training? Does that mean still SGD is not effective for large scale?

1

3

0

189

Kwangjun Ahn @KwangjunA

11 months ago

@seungwookh @jxbz Go Jeremy and Laker!!

0

1

0

274

KwangjunA retweeted

Seungwook Han

@seungwookh

11 months ago

But actually this is the og way of doing it and should stop by E-2103 to see @jxbz and Laker Newhouse whiteboard the whole paper.

seungwookh's tweet photo. But actually this is the og way of doing it and should stop by E-2103 to see @jxbz and Laker Newhouse whiteboard the whole paper. https://t.co/NjV3qnxCaK

1

74

5

11

8K

Kwangjun Ahn @KwangjunA

11 months ago

@konstmish @aaron_defazio Thanks Konstantin!

0

1

0

417

KwangjunA retweeted

Konstantin Mishchenko

@konstmish

11 months ago

Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.

konstmish's tweet photo. Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape. https://t.co/gZKaivYAu8

4

140

17

99

14K

Kwangjun Ahn

@KwangjunA

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users