Gregory Yauney @gyauney - Twitter Profile

almost 2 years ago

@dallascard @ShayneRedford @emilyrreif @katherine1ee @dmimno @daphneipp @naaclmeeting Thanks--it was great to meet you the other day!

0

1

0

110

Gregory Yauney @gyauney

almost 2 years ago

Our Pretrainer's Guide won an ✨outstanding paper award✨ at #NAACL2024 today! Big congrats to all the coauthors, especially @ShayneRedford (who led this big project), @emilyrreif, @katherine1ee, @dmimno, and @daphneipp! Thanks @naaclmeeting!

gyauney's tweet photo. Our Pretrainer's Guide won an ✨outstanding paper award✨ at #NAACL2024 today! Big congrats to all the coauthors, especially @ShayneRedford (who led this big project), @emilyrreif, @katherine1ee, @dmimno, and @daphneipp! Thanks @naaclmeeting! https://t.co/0pcx8Hmpdm

Gregory Yauney @gyauney

almost 2 years ago

Come talk to us about pretraining data curation at #NAACL2024 at 2pm at poster session 2! We're presenting A Pretrainer's Guide to Training Data Paper: https://t.co/YyMpMXuLIm

gyauney's tweet photo. Come talk to us about pretraining data curation at #NAACL2024 at 2pm at poster session 2! We're presenting A Pretrainer's Guide to Training Data

Paper: https://t.co/YyMpMXuLIm https://t.co/lsBam5E6zY

1

106

21

66

19K

10

110

11

22

16K

gyauney retweeted

Shayne Longpre

@ShayneRedford

almost 2 years ago

Super appreciative of the recognition from #NAACL2024 — our Pretrainer’s Guide won an 🌟Outstanding Paper Award🌟🏆 This was a year long analysis into pretraining age, quality & toxicity data filters. Gratitude to our team 🙏🏼 @gyauney @emilyrreif @katherine1ee @ada_rob @denny_zhou @barret_zoph @_jasonwei Kevin @dmimno @daphneipp https://t.co/sTy8QfJvMP

12

95

17

14

13K

Gregory Yauney @gyauney

almost 2 years ago

Come talk to us about pretraining data curation at #NAACL2024 at 2pm at poster session 2! We're presenting A Pretrainer's Guide to Training Data Paper: https://t.co/YyMpMXuLIm

1

106

21

66

19K

Who to follow

Lucy Li

@lucy3_li

Postdoc @uwnlp. Incoming assistant prof @WisconsinCS. Prev @UCBerkeley, @allen_ai, @MSFTResearch, @stanfordnlp. More silly at https://t.co/rtSSUhWQnL.

Nazneen Rajani

@nazneenrajani

building @collinearAI 🧪 | MIT 35u35 | UN AI Advisory Body | Featured in NYT, Quanta, Science, MIT TR| Previously: @huggingface 🤗, @SFResearch, PhD @utcompsci

Chenhao Tan

@ChenhaoTan

Professor @UChicagoCS @UChicago. Directing @ChicagoHAI, also part of @UChicagoCI. Email for Postdoc/PhD opportunities. https://t.co/dcyfHofBY7

Gregory Yauney @gyauney

about 2 years ago

In the paper, we show that this max random baseline can be a better predictor of whether the best prompt will outperform random guessing on an unseen set. You can use this baseline right away on your own classification tasks! Code: https://t.co/WNLZhTkNA7

0

3

0

164

Gregory Yauney @gyauney

about 2 years ago

Evaluating many prompts on small few-shot datasets can make you think you’ve beaten random guessing when you haven’t! @dmimno and I study a simple drop-in replacement random baseline that protects against validation set reuse and small datasets: https://t.co/yHYYwWtOaJ

gyauney's tweet photo. Evaluating many prompts on small few-shot datasets can make you think you’ve beaten random guessing when you haven’t! @dmimno and I study a simple drop-in replacement random baseline that protects against validation set reuse and small datasets: https://t.co/yHYYwWtOaJ https://t.co/gTAoI5P7Bx

3

28

6

15

3K

Gregory Yauney @gyauney

about 2 years ago

This problem goes away if you have a large validation set, but for the kind of fast-moving settings where in-context learning shines, that’s not always feasible. And there’s nothing wrong with trying lots of prompts! You just have to make sure you factor that into your baseline.

1

0

187

Gregory Yauney @gyauney

over 2 years ago

I'm postering this afternoon at #EMNLP2023! Stop by if you want to talk about how Data Similarity is Not Enough to Explain Language Model Performance: https://t.co/RIQnDRwCwM. Joint work with wonderful collaborators @emilyrreif and @dmimno

0

12

4

2

3K

gyauney retweeted

Shayne Longpre

@ShayneRedford

about 3 years ago

#NewPaperAlert When and where does pretraining (PT) data matter? We conduct the largest published PT data study, varying: 1⃣ Corpus age 2⃣ Quality/toxicity filters 3⃣ Domain composition We have several recs for model creators… 📜: https://t.co/SH50o0ktHO 1/ 🧵

ShayneRedford's tweet photo. #NewPaperAlert When and where does pretraining (PT) data matter?

We conduct the largest published PT data study, varying:
1⃣ Corpus age
2⃣ Quality/toxicity filters
3⃣ Domain composition

We have several recs for model creators…
📜: https://t.co/SH50o0ktHO

1/ 🧵 https://t.co/udsiDts8QY

11

353

86

207

121K

gyauney retweeted

Emily Reif @emilyrreif

about 3 years ago

When and where does pretraining data matter? New paper on how varying the pretraining data of LLMs affects downstream performance: https://t.co/MQc0fuHEws But first, what do we know about the data itself? 1/ 🧵

emilyrreif's tweet photo. When and where does pretraining data matter? New paper on how varying the pretraining data of LLMs affects downstream performance: https://t.co/MQc0fuHEws

But first, what do we know about the data itself?

1/ 🧵 https://t.co/zFP53BBe8k

2

168

49

69

30K

Gregory Yauney @gyauney

about 4 years ago

Full details in our EMNLP 2021 paper: https://t.co/HSXE9Wm7YC

0

5

0

2

0

Gregory Yauney @gyauney

about 4 years ago

Using fine-tuned language models makes a hard text classification task like MNLI easy, but why? (new work with @dmimno)

2

49

6

16

0

Gregory Yauney @gyauney

about 4 years ago

Read our blog post to find out more and get code to try it out on your own data! https://t.co/yUyFpLofko

1

8

1

4

0

Gregory Yauney

@gyauney

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users