Ekdeep Singh Lubana @EkdeepL - Twitter Profile

Ekdeep Singh Lubana @EkdeepL

about 2 hours ago

Catch my bro and get an insider look into our upcoming work 👀

Thomas Fel

@thomas_fel_

about 9 hours ago

At CVPR this week for a talk on neural geometry of large vision models. If you’re interested in interpretability or joining @GoodfireAI, come say hi. 🤠

thomas_fel_'s tweet photo. At CVPR this week for a talk on neural geometry of large vision models. If you’re interested in interpretability or joining @GoodfireAI, come say hi. 🤠 https://t.co/guuZQGb3YQ

1

54

7

9

3K

0

8

1

663

EkdeepL retweeted

Naomi Saphra @nsaphra

about 11 hours ago

My big takeaway from our new work: saturation is the underrated key to learning. Always think about what concepts are saturating, because that’s when you get to learn the next one.

1

44

5

21

3K

EkdeepL retweeted

Tomek Korbak

@tomekkorbak

about 13 hours ago

such a good twitter thread!

1

53

2

46

8K

Ekdeep Singh Lubana @EkdeepL

about 3 hours ago

@tomekkorbak Possibly my biggest learning from this project was that @ChrisGPotts is a master of tweets :p

1

6

0

318

Who to follow

Karan Desai (KD)

@kdexd

Building @theworldlabs, prev: PhD @UMichCSE. I fight the devil in the details 🧐

Hidenori Tanaka

@Hidenori8Tanaka

Group Leader, Physics of Intelligence Program at Harvard University Physics of Artificial Intelligence Group, NTT Research, Inc.

Cengiz Pehlevan

@CPehlevan

Theoretical neuroscience, theory of neural computation, physics of learning and intelligence. Associate Professor of Applied Mathematics @Harvard SEAS

Ekdeep Singh Lubana @EkdeepL

2 days ago

@ericjmichaud_ Thanks! FWIW, the toy task and theory are super simple and arguably a special case of some of the prior works (as we noted in related work discussion). However, the fact that the predictions of this account generalize to large scale models was very exciting to me. 😁

0

4

0

247

Ekdeep Singh Lubana @EkdeepL

3 days ago

Very excited to have this paper out! We show by having more parameters, larger models see reduced interference between updates. This allows them to retain memories of rarely observed samples of a task, eventually allowing them to learn even the tail-end of the distribution. (1/3)

Christopher Potts

@ChrisGPotts

3 days ago

We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.

ChrisGPotts's tweet photo. We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining. https://t.co/vqRUUe6whP

19

819

120

735

111K

4

182

19

87

15K

Ekdeep Singh Lubana @EkdeepL

2 days ago

@solacebellamy @ChrisGPotts FWIW, one could argue such an experiment has happened before. GPT-3 was 175B params and is easily outperformed by models of 7--8B scale, because these models were trained on more and better data.

0

36

Ekdeep Singh Lubana @EkdeepL

2 days ago

@solacebellamy @ChrisGPotts Your inference is correct, but the suggestion requires us knowing what to train on. We'll have to change the data mixture in a manner that allows us to get, from a standard training pipeline, the performance of a larger model. (1/2)

1

0

67

Ekdeep Singh Lubana @EkdeepL

2 days ago

@jiaxinwen22 @ChrisGPotts Had not! Damn, this literature is insane haha.

1

0

30

Ekdeep Singh Lubana @EkdeepL

2 days ago

@jiaxinwen22 @ChrisGPotts Definitely! Let me email you.

0

12

Ekdeep Singh Lubana @EkdeepL

2 days ago

@jiaxinwen22 @ChrisGPotts Hmm, curious to hear more. For context, the above is Olmo pretraining data and the task is just comparing numbers. Comparison for ordered concepts can be expected to be present in general pretraining data, and if you can isolate such data, I expect you'll see curves like this.

0

72

Ekdeep Singh Lubana @EkdeepL

2 days ago

@jiaxinwen22 @ChrisGPotts Yeah agree with this. We made an intentional choice early on in the project to not bother with shared task structures just yet, since we didn't know what the dynamics without shared structure looked like. However, we fully intend to follows up soon. :)

2

0

28

Ekdeep Singh Lubana @EkdeepL

2 days ago

@mirandrom Totally missed this; will read and respond later, but will also add a citation!

1

0

120

EkdeepL retweeted

Goodfire

@GoodfireAI

3 days ago

New research from Goodfire and collaborators: why do larger models learn more tasks? (spoiler: it’s bottlenecked by data)

3

179

14

114

21K

Ekdeep Singh Lubana @EkdeepL

3 days ago

Check out @AndrewLampinen's post and the shoutout to our work: I really loved the emphasis on inability to learn, more than the ability to forget! Gotta ask the right questions. :)

Christopher Potts

@ChrisGPotts

3 days ago

Speaking of blog posts, our coauthor @AndrewLampinen just did a post that, among other things, relates our results to themes of continual learning and catastrophic interference: https://t.co/YRHEhRJs6d

1

20

1

8

3K

0

14

0

2

2K

Ekdeep Singh Lubana @EkdeepL

3 days ago

This was such a collab work: Jing + @ChrisGPotts built the analysis recipe, finding our toy setup generalized very well to LMs. @danielwurgaft @rach_it_ @elmelis @nsaphra helped concretize this setup, and the core hypothesis came from chats with @LauraRuis and @AndrewLampinen!

1

13

0

3

559

Ekdeep Singh Lubana @EkdeepL

3 days ago

We've done a lot of work in the past on understanding how scaling enables learning of new abilities (see https://t.co/CNC5eXLNdk), but this is the first time we attacked param scaling, finding that for a given training process, having more params offers a genuine advantage. (2/3)

1

10

0

7

1K

EkdeepL retweeted

Goodfire

@GoodfireAI

14 days ago

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

24

1K

150

760

169K

Ekdeep Singh Lubana @EkdeepL

13 days ago

@thomas_fel_ @sarahwiegreffe @GoodfireAI Look who's talking :)

0

63

Ekdeep Singh Lubana @EkdeepL

13 days ago

@sarahwiegreffe @GoodfireAI We're gonna try our best to make sure that list keeps growing 😁

0

4

0

77

Ekdeep Singh Lubana

@EkdeepL

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users