Asher @Ashkl111 - Twitter Profile

Pinned Tweet

2 months ago

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

Ashkl111's tweet photo. New preprint: "Stability and Generalization in Looped Transformers"

Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵 https://t.co/HymAYtz3hq

1

11

1

4

809

Asher @Ashkl111

4 days ago

One main problem of post-norm in fixed-depth transformers is the matrix power of the Jacobian creating vanishing gradients in backprop. I've discussed this more in my own work, but it's interesting how looped models seem to need this Jacobian power to remain stable.

0

37

Asher @Ashkl111

about 2 months ago

@KeshavRamji Are you presenting this at the iclr latent reasoning workshop? would love to learn more if so!

0

540

Asher @Ashkl111

about 2 months ago

My tl is currently half looped transformers and half debates about pressing red/blue this app is hilarious (blue is the correct answer btw)

0

2

0

59

Who to follow

David Stein

@davidstein65

MCEA President , Statistics Teacher, Calculus Teacher, Puzzle Lord, Daddy, other stuff too. He/him Follow @MCEAPresStein

McConnell Bristol

@mcc_bristol

Coloradan @BrownUniversity, time with @USTreasury, @SenatorBennet. Macro/elections/tax 📈

about 2 months ago

@huskydogewoof Just did!

1

2

0

33

Asher @Ashkl111

about 2 months ago

This is an absolutely amazing repo. My only problem is that it didn’t exist when I began my work :)

Benhao Huang

@huskydogewoof

about 2 months ago

Introducing 🔁 Awesome-Loop-Models: a curated repo for keeping up with loop models! Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions. 🧵 [1/n]

huskydogewoof's tweet photo. Introducing 🔁 Awesome-Loop-Models: a curated repo for keeping up with loop models!

Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions.

🧵 [1/n]

1

69

12

50

46K

1

5

2

0

665

Asher @Ashkl111

about 2 months ago

@huskydogewoof Glad to have it in as well!!

1

3

0

37

Asher @Ashkl111

2 months ago

@RidgerZhu Very cool write up. In my own work I’ve found that smaller looped models w/ higher LR often mimic many of the problems of larger models w/ lower LR, which makes it easier to avoid unstable architectures before scaling

0

2

0

1

348

Asher @Ashkl111

2 months ago

@hayden_prairie Awesome! Shoot me an email at [email protected] and we can figure out a time that works -- I'm there the whole week

0

1

0

24

Asher @Ashkl111

2 months ago

@DimaKrotov The idea of "reasoning in latent space" is what got me working on looped transformers in the first place. Really cool to see the energy framing, I think there's some clean relationships with looped transformers and energy minimization at basins.

0

9

0

1

1K

Ashkl111 retweeted

Bret Greenstein

@bretgreenstein

2 months ago

Companies love to talk about how long reasoning times 'solve' intelligence. This paper shows that how you use the reasoning loop and create the right iteration architecture matters a lot.

0

1

0

100

Asher @Ashkl111

2 months ago

@josephdviviano When I started working with looped TFs ~a year ago, I was constantly annoyed at how unpredictably they failed. Ended up writing theory on when this happens -- hopefully it saves future researchers those first few months. https://t.co/x9xC9Zp4y9

Asher @Ashkl111

2 months ago

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

1

11

1

4

809

0

2

0

94

Asher @Ashkl111

2 months ago

@JFPuget One of the theoretical benefits of looped transformers in particular is their ability to run for **more** loops than in training to solve harder problems. Whether they do in all cases is... complex https://t.co/x9xC9Zp4y9

Asher @Ashkl111

2 months ago

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

1

11

1

4

809

0

2

0

147

Asher @Ashkl111

2 months ago

Full paper: https://t.co/bkxrXA1kLN I’ll be at ICLR in Rio next week presenting a different paper on tabular ML. If you’re working on looped/recurrent models, test-time compute, or tabular ML, I’d love to chat in person.

0

3

0

126

Asher @Ashkl111

2 months ago

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

1

11

1

4

809

Asher @Ashkl111

2 months ago

I find: - Without recall, looped models act like basin selectors rather than smooth input-dependent algorithms - Recall helps preserve input dependence, but models are often still fragile - Outer normalization broadens the parameter regions over which the models are stable

1

2

0

99

Asher @Ashkl111

7 months ago

@heyanuja @papertrailshq So, so glad somebody else continued the work when I didn't :) I hope this gets huge!

0

17

Asher @Ashkl111

7 months ago

@heyanuja @papertrailshq So funnily enough, I saw that post a few months ago and started making my own version -- never got far since I had other projects, but I looked into research databases and there are really cool existing open-source ones that would just require API calls/downloading, no scraping!

3

2

0

62

Asher

@Ashkl111

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users