Alexandre TL @AlexandreTL2 - Twitter Profile

@QuasarModels Interesting! The KDA baseline seems already very strong tho, 99% at 10M context length. Is it the same model as in Table 7 ? Also, would it be possible to have the raw NIAH scores for the two models of Table 7?

AlexandreTL2's tweet photo. @QuasarModels Interesting! The KDA baseline seems already very strong tho, 99% at 10M context length. Is it the same model as in Table 7 ? Also, would it be possible to have the raw NIAH scores for the two models of Table 7? https://t.co/oJn4ew50oh

3

4

0

1

213

Alexandre TL @AlexandreTL2

2 months ago

@QuasarModels prev image is Table 13. this is Table 7:

0

1

0

43

Alexandre TL @AlexandreTL2

2 months ago

@Ji_Ha_Kim Looks good indeed!

0

29

Alexandre TL @AlexandreTL2

2 months ago

@Ji_Ha_Kim uhm I meant in general, even if batch size is not fixed, it disappears in the ratio : mB/mD = (B'/B)/(D'/D) where D'=B'*num_iters and D=B*num_iters_base so the B' and the B cancel

1

2

0

39

AlexandreTL2 retweeted

Jean-Gabriel BARTHELEMY

@JG_Barthelemy

3 months ago

We’ve been lucky enough to test Mamba-3 ahead of the curve. 🧪 Here is how it integrates into Hybrid Models (Spoiler: it unlocks Muon for SSMs for the first time). 🧵

1

108

22

47

15K

Alexandre TL @AlexandreTL2

3 months ago

@vfleaking @wen_kaiyue

0

3

0

213

Alexandre TL @AlexandreTL2

3 months ago

This is basically Hyperball + GeoNorm

Francesco Bertolotti @f14bertolotti

3 months ago

Interesting optimizer that projects the gradient onto a tangent n-dimensional ball on the loss landscape. Unfortunately, the experiments are fairly basic, but i quite like the idea. 🔗https://t.co/cjljvNkWCJ

f14bertolotti's tweet photo. Interesting optimizer that projects the gradient onto a tangent n-dimensional ball on the loss landscape. Unfortunately, the experiments are fairly basic, but i quite like the idea.

🔗https://t.co/cjljvNkWCJ https://t.co/g6hUnU7q70

3

258

30

200

20K

2

62

9

51

5K

Alexandre TL @AlexandreTL2

3 months ago

https://t.co/ERht8WNd9X https://t.co/xWnrdsOqfX

0

3

1

2

220

Alexandre TL @AlexandreTL2

5 months ago

@wen_kaiyue cool ok thx!

0

129

Alexandre TL

@AlexandreTL2

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users