Indico Data Labs @IndicoDataLabs - Twitter Profile

over 3 years ago

There are some really compelling results in this paper (some intuitive, some not so much). The causality analysis shows some non-linearity worth further investigation and further analysis of the effect of parameter counts may be warranted, assuming the true dynamic is sigmoidal.

0

21

Indico Data Labs @IndicoDataLabs

over 3 years ago

Large Language Models Struggle to Learn Long-Tail Knowledge by: Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel https://t.co/4q5o1d1s5w

1

0

91

Indico Data Labs @IndicoDataLabs

over 3 years ago

Counterfactual analysis suggesting a causal relationship between removal of large numbers of relevant documents and QA performance.

IndicoDataLabs's tweet photo. Counterfactual analysis suggesting a causal relationship between removal of large numbers of relevant documents and QA performance. https://t.co/gC0HVW2Z36

1

0

35

Indico Data Labs @IndicoDataLabs

over 3 years ago

Overall, this work provides thorough and encouraging results for distilling pre-trained language models into recursive transformers. The idea of adding per-layer adaptors whilst re-using the MLP and Attention particularly interesting.

0

24

Who to follow

Anoop Deoras

@adeoras

Director, AI/ML at AWS AI

Zineng Tang

@ZinengTang

PhD in @Berkeley_ai and @BerkeleyNLP. Previously @UNCNLP and @MSFTResearch.

Brazy Da Bo$$

@BrazyDaBoss

Habitual Lyrical III by Brazy Da Bo$$ https://t.co/HpViozsLNG

Indico Data Labs @IndicoDataLabs

over 3 years ago

MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers by Nouriborji et at. Proposes a method for distilling Bert-Style transformers into Albert-style recursive transformers. https://t.co/dT1LeVzTpN

1

0

33

Indico Data Labs @IndicoDataLabs

over 3 years ago

The authors find that distilling the model with adaptors, that are different for each iteration of the recursive block, improves performance across all tasks. The adaptors seem to help the layers better mimic the behaviour of the separate layers from the teacher.

IndicoDataLabs's tweet photo. The authors find that distilling the model with adaptors, that are different for each iteration of the recursive block, improves performance across all tasks. The adaptors seem to help the layers better mimic the behaviour of the separate layers from the teacher. https://t.co/UmQpyIqrum

1

0

32

Indico Data Labs @IndicoDataLabs

over 3 years ago

Overall this was a refreshing work from OpenAI that shines light on often underappreciated aspects of ML -- dataset curation and generalization behavior! Models and code are openly available at: https://t.co/tFojb5itlF

0

Indico Data Labs @IndicoDataLabs

over 3 years ago

This week we're highlighting the open-source Whisper speech recognition model outlined in "Robust Speech Recognition via Large-Scale Weak Supervision" by former Indico founder @AlecRad, @_jongwook_kim, @txhf, @gdb, @mcleavey and @ilyasut. https://t.co/uTUWWFqRvu

1

2

0

Indico Data Labs @IndicoDataLabs

over 3 years ago

Finally, since a portion of the training examples were non-English audio transcriptions or non-English audio translated to English, the model can be used in these settings as well. Scaling trends show clear improvements from model scale, especially in multilingual settings.

IndicoDataLabs's tweet photo. Finally, since a portion of the training examples were non-English audio transcriptions or non-English audio translated to English, the model can be used in these settings as well. Scaling trends show clear improvements from model scale, especially in multilingual settings. https://t.co/NevkmUPlPo

1

0

Indico Data Labs

@IndicoDataLabs

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users