Alec Radford @AlecRad - Twitter Profile

Pinned Tweet

almost 8 years ago

What I've been working on for the past year! https://t.co/CAQMYS1rR7 Inspired by CoVE, ELMo, and ULMFiT we show that a single transformer language model can be finetuned to a wide variety of NLP tasks and performs very well with little tuning/tweaking.

46

2K

450

371

0

AlecRad retweeted

Nick Levine

@status_effects

about 1 month ago

New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

178

3K

391

2K

1M

AlecRad retweeted

David Duvenaud

@DavidDuvenaud

about 1 month ago

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

200

4K

451

2K

1M

AlecRad retweeted

Grace Luo @graceluo_

4 months ago

We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with @feng_jiahai, @trevordarrell, @AlecRad, @JacobSteinhardt. More in thread 🧵

31

1K

192

1K

221K

Who to follow

Soumith Chintala

@soumithchintala

Building new things @thinkymachines. Also dabble in robotics at NYU. Cofounded @PyTorch. AI is delicious when it is accessible and open-source.

Hugo Larochelle

@hugo_larochelle

Mila Scientific Director. Ex @Google DeepMind & Twitter Cortex. Father of 4. // Directeur scientifique à Mila. Ex @Google DeepMind & Twitter Cortex. Père de 4.

Ryan Adams

@ryan_p_adams

Machine Learning Researcher, CS Professor (@PrincetonCS), Dad, Woodworker

AlecRad retweeted

Neil Rathi

@neil_rathi

4 months ago

New paper, w/@AlecRad Models acquire a lot of capabilities during pretraining. We show that we can precisely shape what they learn simply by filtering their training data at the token level.

neil_rathi's tweet photo. New paper, w/@AlecRad

Models acquire a lot of capabilities during pretraining.

We show that we can precisely shape what they learn simply by filtering their training data at the token level. https://t.co/g0bg78mliO

26

1K

98

664

111K

Alec Radford

@AlecRad

about 5 years ago

@skornblith @DGBassani It's the max width with 12 layers that could fit in memory on the dev box that trained GPT-1. Also worked out to a month to train which was edge of my patience. The prototypes went 6 layer 512 wide (og tformer paper "base") to 12 layer 512 wide to 12 layer 768 wide.

2

64

4

18

0

Alec Radford

@AlecRad

almost 7 years ago

@NPCollapse The raw version used for gpt-2 is available at gs://gpt-2/data/lambada_development.jsonl and gs://gpt-2/data/lambada_test.jsonl

0

18

1

4

0

Alec Radford

@AlecRad

about 7 years ago

@chipro Dynamic eval improves an AWD-LSTM baseline by 0.11 nats. Can't be sure it'd have equal sized benefits for both architectures (though https://t.co/hkVohkVMd4 suggests it works fine) but if that gain carried over, the Transformer-XL model would be 48.6 test perplexity.

0

8

0

2

0

Alec Radford

@AlecRad

about 7 years ago

This is a really fun live experiment with twitch chat predictably oscillating between love and hate based on the sample.

16

204

15

44

0

AlecRad retweeted

Christine McLeavey @mcleavey

about 7 years ago

Extremely excited to share work I've been doing at OpenAI the past few months: MuseNet, a neural net music generator. It's been a huge team effort pulling this all together!

35

1K

198

78

0

AlecRad retweeted

rewon @rewonfc

about 7 years ago

Releasing some work today with @scottgray76 @AlecRad and @ilyasut. Contains some simple adaptations for Transformers that extend them to long sequences.

1

211

59

29

0

Alec Radford

@AlecRad

about 7 years ago

@jeremyphoward @RogerGrosse The graph shows lines for various initial values so I would guess those aren't learned but manually set.

1

8

0

Alec Radford

@AlecRad

about 7 years ago

@tallinzen @mcxfrank @emilymbender @yoavgo Don't know exact # since there is not a traditional word-level tokenization step. There are 9B tokens total and the ratio is probably around 1.1 tokens per word? You can probably just call those tokens words for the purpose of a # on a slide.

1

4

0

AlecRad retweeted

Graham Neubig

@gneubig

over 7 years ago

One commonly cited argument about the difficulty of learning common-sense reasoning is that "no-one writes down common sense". A counter-argument is "well, the web is big": https://t.co/qPNmra86ES

gneubig's tweet photo. One commonly cited argument about the difficulty of learning common-sense reasoning is that "no-one writes down common sense". A counter-argument is "well, the web is big": https://t.co/qPNmra86ES https://t.co/2c721qlTlW

5

144

23

28

0

Alec Radford

@AlecRad

over 7 years ago

@jacobandreas Okay cool - thanks for clarifying!

0

1

0

Alec Radford

@AlecRad

over 7 years ago

@jacobandreas Sorry - I interpreted: "if a paper had crossed my desk saying here are some hand-curated best-of-25 samples from our model + PPL comparisons with models trained on other datasets" as about the paper - especially since the second half of the statement is about the paper.

1

0

Alec Radford

@AlecRad

over 7 years ago

@jacobandreas The paper relegates samples to the appendix. The unicorn sample is on page 20 and used to make a qualitative point. Almost everything else in the paper is random samples.

2

4

0

Alec Radford

@AlecRad

over 7 years ago

@jacobandreas Those samples use a different technique than the ones shown in the blog. The samples you are looking at are temperature=1. We use top_k=40. Unconditional samples with that are here: https://t.co/OxQBnCc6mA It's also important to note that conditioning on "real" text helps too.

1

18

4

2

0

AlecRad retweeted

Nando de Freitas

@NandoDF

over 7 years ago

First, reproducibility is not about rerunning code to get the same results. Science must be more robust, as naive copying has many flaws. Second, reproducibility should never be above public safety. We must publish responsibility, with hope and kindness in our minds.

4

124

28

11

0

AlecRad retweeted

Joshua Achiam

@jachiam0

over 7 years ago

I'd like to weigh in on the #GPT2 discussion. The decision not to release the trained model was carefully considered and important for norm-forming. Serving the public good requires us to draw lines on release somewhere: better long before catastrophe than after.

9

368

92

38

0

Alec Radford

@AlecRad

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users