Max Marion @maxdoesresearch - Twitter Profile

about 2 years ago

Zach is killing it in a paper that takes my previous work and expands on it considerably. Highly recommend reading it and following Zach!

Zack Ankner

@ZackAnkner

about 2 years ago

New paper where we explore using a small LM’s perplexity to prune the pretraining data for larger LMs. We find that small LMs can prune data for up to 30x larger LMs, data pruning works in the overtrained and data-constrained regimes, and more! https://t.co/XYbI0Ijois

11

326

60

253

73K

1

19

2

5

7K

maxdoesresearch retweeted

Jonathan Frankle

@jefrankle

about 2 years ago

Fixed it for you, @code_star

4

90

8

18

54K

maxdoesresearch retweeted

Andrew Canis @andrewcanis

about 2 years ago

I've added support for Command-R to llama.cpp! Command-R is an exciting new 35B model with 128k context length for RAG and Tool Use I also converted the model to GGUF format (F16, Q8, Q4, Q2) HF: https://t.co/SKqKUGM2kM Release: https://t.co/EivPPhm4gm @cohere @francoisfleuret

3

86

16

30

9K

Max Marion @maxdoesresearch

over 2 years ago

Data Selection is in vogue

Alon Albalak @AlbalakAlon

over 2 years ago

{UCSB|AI2|UW|Stanford|MIT|UofT|Vector|Contextual AI} present a survey on🔎Data Selection for LLMs🔍 Training data is a closely guarded secret in industry🤫with this work we narrow the knowledge gap, advocating for open, responsible, collaborative progress https://t.co/vpRIXWFdCZ

AlbalakAlon's tweet photo. {UCSB|AI2|UW|Stanford|MIT|UofT|Vector|Contextual AI} present a survey on🔎Data Selection for LLMs🔍

Training data is a closely guarded secret in industry🤫with this work we narrow the knowledge gap, advocating for open, responsible, collaborative progress
https://t.co/vpRIXWFdCZ https://t.co/eNLEvvJ52O

10

303

72

265

111K

1

3

0

930

Who to follow

Arash Ahmadian

@aahmadian_

Research Scientist @GoogleDeepmind, Gemini RL & post-training, Gemini 3. prev: @Cohere @CohereForAI

Blue Dog Eyes

@BlueDogEyes1

An independent geopolitical commentator. My views are my own. The world is ruled by a lie, and I can't stand a lie. The truth is the only real value.

João Gante

@joao_gante

Research Engineer @GoogleDeepMind, Gemini Diffusion prev: huggingface 🤗 (transformers team), nPlan, PhD@IST 🇵🇹

maxdoesresearch retweeted

Matei Zaharia @matei_zaharia

over 2 years ago

Interesting trend in AI: the best results are increasingly obtained by compound systems, not monolithic models. AlphaCode, ChatGPT+, Gemini are examples. In this post, we discuss why this is and emerging research on designing & optimizing such systems. https://t.co/tfnNuoTNNY

29

1K

255

820

319K

Max Marion @maxdoesresearch

over 2 years ago

saw just how much work went into this and its nothing short of incredible. Grats to the whole team - its a huge milestone!

Sara Hooker

@sarahookr

over 2 years ago

Today, I am very proud share what we have been working on for the last 14 months. ✨ Introducing Aya -- a new state-of-art for massively multilingual models. 🔥🎉

48

996

157

190

98K

1

15

0

1K

maxdoesresearch retweeted

Ahmet Üstün

@ahmetustun89

over 2 years ago

Thrilled to announce Aya 🌿, a massively multilingual instruction-tuned LLM, featuring 101 languages and the largest collection of multilingual instruction datasets. Over half of these languages are under-resourced. A monumental effort from @CohereForAI and Aya team 🚀

4

97

14

10

17K

maxdoesresearch retweeted

Max ⛅

@maxisawesome538

over 2 years ago

just saw (Marion et al., 2023) in a paper for the first time 🥲

10

81

5

1

14K

Max Marion @maxdoesresearch

over 2 years ago

@hongjian_zou heya thanks! All models received the same number of training steps and used the same amount of compute regardless of the dataset pruning. If the dataset was pruned down to 50%, the model trained on that dataset saw each datapoint twice.

0

57

Max Marion @maxdoesresearch

over 2 years ago

Neurips was so much fun that I'm determined to come back with a paper next year 😤

Sara Hooker

@sarahookr

over 2 years ago

🔥🎉 @maxdoesresearch presents “when less is more: investigating data pruning for pretraining LLMs at scale” Attrib Workshop 2023

sarahookr's tweet photo. 🔥🎉 @maxdoesresearch presents “when less is more: investigating data pruning for pretraining LLMs at scale”

Attrib Workshop 2023 https://t.co/o9Bp5yyDWw

1

46

2

7

11K

2

42

3

2

7K

Max Marion @maxdoesresearch

over 2 years ago

@__femb0t that's right (check my header)

0

1

199

Max Marion @maxdoesresearch

over 2 years ago

@sarahookr wow it's @AlbalakAlon 😍

1

2

0

265

maxdoesresearch retweeted

Ksenia Se

@Kseniase_

over 2 years ago

LLMs improved using available data from the noisy Internet. @CohereForAI researchers achieved unexpected results by pruning data. Their research suggests removing most pretraining data while maintaining performance!

Kseniase_'s tweet photo. LLMs improved using available data from the noisy Internet.

@CohereForAI researchers achieved unexpected results by pruning data.

Their research suggests removing most pretraining data while maintaining performance! https://t.co/rkknEUp1r2

1

73

11

51

12K

maxdoesresearch retweeted

Cohere

@cohere

over 2 years ago

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale https://t.co/X9xddgG2fV @maxdoesresearch @ahmetustun89 @luizapzbn @W4ngatang @mziizm @sarahookr

1

11

2

0

2K

maxdoesresearch retweeted

Cohere Labs

@Cohere_Labs

over 2 years ago

In 2022, we Launched the Cohere For AI Scholars Program to help close the gap between research experience and opportunity. In our inaugural year, we welcomed 6 talented researchers - @luizapzbn, @lekeonilude, @maxdoesresearch, @aahmadian_, @tedzadouri and Meriem Boubdir.

Cohere_Labs's tweet photo. In 2022, we Launched the Cohere For AI Scholars Program to help close the gap between research experience and opportunity. In our inaugural year, we welcomed 6 talented researchers - @luizapzbn, @lekeonilude, @maxdoesresearch, @aahmadian_, @tedzadouri and Meriem Boubdir. https://t.co/71SFbsWaML

2

26

4

2

3K

Max Marion @maxdoesresearch

over 2 years ago

@code_star @CohereForAI I pinky promise bro

0

1

0

42

Max Marion @maxdoesresearch

over 2 years ago

📢New Pretraining Paper 📢 Delighted to share our new paper coming out of @forai_ml : "When Less is More: Investigating Data Pruning for Pretaining LLMs at Scale" Paper: https://t.co/VwtiDGpRek w/ @ahmetustun89 @luizapzbn @W4ngatang @mziizm @sarahookr

8

83

24

43

29K

maxdoesresearch retweeted

Sara Hooker

@sarahookr

over 2 years ago

Really proud of our work led by @maxdoesresearch w @ahmetustun89 @luizapzbn @W4ngatang @mziizm 🎉 LM datasets are huge. Is all text needed? How can we measure data quality in this setting? Enter data pruning: removing subsets least valuable while preserving performance.

3

83

16

22

17K

Max Marion @maxdoesresearch

over 2 years ago

You're intuitions on the easy/hard data is on par with what we found - very easy data was often user agreements or text that would appear all over the internet, like at the bottom of a webpage. The harder subset is more complicated - some of it was nonsense, but some text, like medical or scientific text, can have high perplexity but could still useful for certain contexts. Selecting a good validation set would, ironically, be an excellent extension of this line of work 😂

0

72

Max Marion @maxdoesresearch

over 2 years ago

@EIFY @forai_ml @ahmetustun89 @luizapzbn @W4ngatang @mziizm @sarahookr ...we found that you do need some training in the reference model to get a usable pruning signal. I think it would be a great next step!

0

41

Max Marion @maxdoesresearch

over 2 years ago

@EIFY @forai_ml @ahmetustun89 @luizapzbn @W4ngatang @mziizm @sarahookr Our EL2N experiments are a version of this, in that we use the same paramete/arch setup and use signals from those models as our pruning metric. The setup you mention is possible but was more complicated engineering wise for us. You would need do some gradient updates, as...

1

0

49

Max Marion

@maxdoesresearch

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users