Sinong Wang @sinongwang - Twitter Profile

Sinong Wang @sinongwang

about 2 months ago

This beast mode is gradually rolling out. Everyone will have a chance to try!

Mostly Borrowed Ideas

@borrowed_ideas

about 2 months ago

Not usually a Meta AI user, but wanted to give them a shot after the latest model release (it's free anyway). So I installed the app on my desktop, and noticed "contemplating" mode (didn't see that on the mobile app btw). When I asked a question, 16 agents simultaneously started working on the question which looks pretty cool!

borrowed_ideas's tweet photo. Not usually a Meta AI user, but wanted to give them a shot after the latest model release (it's free anyway).

So I installed the app on my desktop, and noticed "contemplating" mode (didn't see that on the mobile app btw). When I asked a question, 16 agents simultaneously started working on the question which looks pretty cool!

27

546

23

209

139K

0

2

0

95

sinongwang retweeted

Alexandr Wang

@alexandr_wang

about 2 months ago

check out Contemplating mode for your most complex reasoning queries!

40

294

15

37

38K

sinongwang retweeted

Hongyu Ren

@ren_hongyu

about 2 months ago

Check out Muse Spark, our first milestone in the quest for personal superintelligence! Scaling this with the team has been a total blast. Give it a spin and let us know what you think! 🥑

ren_hongyu's tweet photo. Check out Muse Spark, our first milestone in the quest for personal superintelligence! Scaling this with the team has been a total blast. Give it a spin and let us know what you think! 🥑 https://t.co/iozLDOq2sv

18

315

58

20

71K

Sinong Wang @sinongwang

about 2 months ago

Excited to share Muse Spark, the first model release from Meta MSL. I’ve been on this project since day one and helped build it from scratch. Still early, but we’re excited to keep pushing on pretraining, RL, and test-time compute! https://t.co/GM222TEnL0

sinongwang's tweet photo. Excited to share Muse Spark, the first model release from Meta MSL. I’ve been on this project since day one and helped build it from scratch. Still early, but we’re excited to keep pushing on pretraining, RL, and test-time compute! https://t.co/GM222TEnL0 https://t.co/EamACnpAkL

0

2

0

112

Who to follow

Jason Weston

@jaseweston

Senior Director & RS @Meta + Visiting Prof NYU | OG in LLMs | Pretrain+Finetune in 2008+ | 148k+ citations | Current: Self-Improving & Co-Improving AI

Shikhar

@ShikharMurty

Agents and RL @GoogleDeepMind, prev: Stanford CS PhD student @StanfordNLP. Opinions my own

Sheng Shen

@shengs1123

@xAI | Prev. MetaAi; MSFTResearch, allen_ai, GoogleDeepMind; @berkeley_ai

Sinong Wang @sinongwang

almost 2 years ago

Super excited to share our paper won the outstanding paper in NAACL 2024. Check out our paper: https://t.co/QQbedpNIfz

Chi Han @Glaciohound

almost 2 years ago

🎖 Excited to receive an outstanding paper award at NAACL2024 for LM-Infinite "Zero-Shot Extreme Length Generalization for Large Language Models" work! We extend to 200M length with no parameter updates, with downstream improvements https://t.co/T6MSXbtWpv https://t.co/9UHksOOwfp

5

48

7

5

14K

0

3

0

647

Sinong Wang @sinongwang

about 2 years ago

Excited to share Llama3-preview (8B/70B) that achieves best MMLU results in open source models, and also preliminary results for a 405B model. Also super excited to share that we integrate Llama3 into Meta AI, the world’s best AI assistant! https://t.co/puNxKuQkix

0

1

0

360

sinongwang retweeted

Yam Peleg

@Yampeleg

over 2 years ago

Meta just dropped a banger: LLaMA 2 Long. - Continued pretraining LLaMA on long context and studied the effects of pretraining text lengths. - Apparently having abundant long texts in the pretraing dataset is not the key to achieving strong performance. - They also perform a large experiment session comparing different length scaling techniques. - Surpassed gpt-3.5-turbo-16k’s on a multiple long-context tasks. - They also study the effect of instruction tuning with RL + SFT and all combinations between the two. The model weights are not out yet. Hopefully Soon! 🙏

Yampeleg's tweet photo. Meta just dropped a banger:

LLaMA 2 Long.

- Continued pretraining LLaMA on long context and studied the effects of pretraining text lengths.

- Apparently having abundant long texts in the pretraing dataset is not the key to achieving strong performance.

- They also perform a large experiment session comparing different length scaling techniques.

- Surpassed gpt-3.5-turbo-16k’s on a multiple long-context tasks.

- They also study the effect of instruction tuning with RL + SFT and all combinations between the two.

The model weights are not out yet.
Hopefully Soon! 🙏

13

523

74

238

163K

Sinong Wang @sinongwang

over 2 years ago

Excited to share our latest latest work on long context LLM, which is the new foundation model behind 28 Meta AI agents. The new long-context LLM model also achieves the better performance than ChatGPT-3.5-turbo-16k across various tasks.

AI at Meta

@AIatMeta

over 2 years ago

🆕 Effective Long-Context Scaling of Foundation Models ➡️ https://t.co/oMKlrtPB0s Another piece of research that helps us build engaging conversational experiences for our AIs and the Meta AI assistant.

AIatMeta's tweet photo. 🆕 Effective Long-Context Scaling of Foundation Models ➡️ https://t.co/oMKlrtPB0s

Another piece of research that helps us build engaging conversational experiences for our AIs and the Meta AI assistant. https://t.co/GetjnhPbRp

2

64

11

21

42K

0

1

397

Sinong Wang @sinongwang

almost 3 years ago

Excited to share our latest work on extending LLM context window length without fine-tuning!

AK

@_akhaliq

almost 3 years ago

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models paper page: https://t.co/FUbIEu59vs In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs are deployed for increasingly complex tasks, they often face the needs to conduct longer reasoning processes or understanding larger contexts. In these situations, the length generalization failure of LLMs on long sequences become more prominent. Most pre-training schemes truncate training sequences to a fixed length (such as 2048 for LLaMa). LLMs often struggle to generate fluent texts, let alone carry out downstream tasks, after longer contexts, even with relative positional encoding which is designed to cope with this problem. Common solutions such as finetuning on longer corpora often involves daunting hardware and time costs and requires careful training process design. To more efficiently leverage the generation capacity of existing LLMs, we theoretically and empirically investigate the main out-of-distribution (OOD) factors contributing to this problem. Inspired by this diagnosis, we propose a simple yet effective solution for on-the-fly length generalization, LM-Infinite, which involves only a Lambda-shaped attention mask and a distance limit while requiring no parameter updates or learning. We find it applicable to a variety of LLMs using relative-position encoding methods. LM-Infinite is computational efficient with O(n) time and space, and demonstrates consistent fluency and generation quality to as long as 32k tokens on ArXiv and OpenWebText2 datasets, with 2.72x decoding speedup. On downstream task such as passkey retrieval, it continues to work on inputs much longer than training lengths where vanilla models fail immediately.

_akhaliq's tweet photo. LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

paper page: https://t.co/FUbIEu59vs

In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs are deployed for increasingly complex tasks, they often face the needs to conduct longer reasoning processes or understanding larger contexts. In these situations, the length generalization failure of LLMs on long sequences become more prominent. Most pre-training schemes truncate training sequences to a fixed length (such as 2048 for LLaMa). LLMs often struggle to generate fluent texts, let alone carry out downstream tasks, after longer contexts, even with relative positional encoding which is designed to cope with this problem. Common solutions such as finetuning on longer corpora often involves daunting hardware and time costs and requires careful training process design. To more efficiently leverage the generation capacity of existing LLMs, we theoretically and empirically investigate the main out-of-distribution (OOD) factors contributing to this problem. Inspired by this diagnosis, we propose a simple yet effective solution for on-the-fly length generalization, LM-Infinite, which involves only a Lambda-shaped attention mask and a distance limit while requiring no parameter updates or learning. We find it applicable to a variety of LLMs using relative-position encoding methods. LM-Infinite is computational efficient with O(n) time and space, and demonstrates consistent fluency and generation quality to as long as 32k tokens on ArXiv and OpenWebText2 datasets, with 2.72x decoding speedup. On downstream task such as passkey retrieval, it continues to work on inputs much longer than training lengths where vanilla models fail immediately.

3

186

58

76

44K

0

2

0

1

386

sinongwang retweeted

Qinyuan Ye @qinyuan_ye

almost 4 years ago

Hi #NAACL2022! Last summer we had a crazy idea of distilling transformer models into shallow, sparse, and fast models. Curious about whether and to what extent this idea works? Please come to our presentation tomorrow! 📍 Session 1D @ Elwha A ⏰ Mon 11:30-11:45

qinyuan_ye's tweet photo. Hi #NAACL2022! Last summer we had a crazy idea of distilling transformer models into shallow, sparse, and fast models. Curious about whether and to what extent this idea works? Please come to our presentation tomorrow!

📍 Session 1D @ Elwha A
⏰ Mon 11:30-11:45 https://t.co/kJZLnCKnhK

2

103

19

24

0

sinongwang retweeted

Karthik A Sankararaman 🇮🇳🇺🇸 @karthikabinav

almost 4 years ago

We wondered what happens when aligning dropouts with the common bayesian interpretation as a posterior over the weights, for transformers. Turns out it largely reduces over-fitting; Improvements across many apples-to-apples experiments. @sinongwang @Han_Fang_ @MetaAI

1

65

10

21

0

Sinong Wang @sinongwang

about 4 years ago

Prompt tuning can be instance-dependent. Thrilled to share our new work! "IDPG: An Instance-Dependent Prompt Generation Method". Check out our paper here: https://t.co/s5iWueSJqj

sinongwang's tweet photo. Prompt tuning can be instance-dependent. Thrilled to share our new work!

"IDPG: An Instance-Dependent Prompt Generation Method".

Check out our paper here: https://t.co/s5iWueSJqj https://t.co/zFbRhZhYdM

1

2

1

0

sinongwang retweeted

Xuezhe Ma (Max) @MaxMa1987

over 4 years ago

Thrilled to share our #NeurIPS2021 work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. co-authors: @XiangKong4 @sinongwang @violet_zct @jonathanmay @gabema @LukeZettlemoyer

MaxMa1987's tweet photo. Thrilled to share our #NeurIPS2021 work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks.
co-authors: @XiangKong4 @sinongwang @violet_zct @jonathanmay @gabema @LukeZettlemoyer https://t.co/JCEK0oB6NT

1

47

7

5

0

Sinong Wang @sinongwang

almost 5 years ago

Thrilled to share our new work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. Check our our paper here: https://t.co/BNtqdTAQqH The implementation: https://t.co/US9vTjTG7T.

sinongwang's tweet photo. Thrilled to share our new work! "Luna: Linear Unified Nested Attention".

This is a new linear time transformer architecture achieves competitive results across multiple benchmarks.

Check our our paper here: https://t.co/BNtqdTAQqH
The implementation: https://t.co/US9vTjTG7T. https://t.co/JTLz8XVEqe

1

39

10

9

0

Sinong Wang @sinongwang

about 5 years ago

You don't need a 175B GPT-3 for few shot learning. All you need is entailment! Check out our new preprints: https://t.co/dknCCTUMoJ In short, we propose a new method turning small LM into better few shot learner. @Han_Fang_ @MadianKhabsa @hanna_mao @gabema

sinongwang's tweet photo. You don't need a 175B GPT-3 for few shot learning. All you need is entailment! Check out our new preprints: https://t.co/dknCCTUMoJ

In short, we propose a new method turning small LM into better few shot learner.
@Han_Fang_ @MadianKhabsa @hanna_mao @gabema https://t.co/pClqufszei

3

88

16

22

0

Sinong Wang @sinongwang

almost 6 years ago

SOTA in NLP is typically achieved by LM pretraining followed by finetuning. Our recent paper in ACL shows that pretraining has a diminishing return as the number of training examples increases, and LSTM can be within 1 percent of BERT models. Link: https://t.co/9ZhqbUmCAF

sinongwang's tweet photo. SOTA in NLP is typically achieved by LM pretraining followed by finetuning. Our recent paper in ACL shows that pretraining has a diminishing return as the number of training examples increases, and LSTM can be within 1 percent of BERT models.

Link: https://t.co/9ZhqbUmCAF https://t.co/8KDpYjq4Yl

4

243

54

53

0

sinongwang retweeted

Yannic Kilcher 🇸🇨

@ykilcher

almost 6 years ago

The Linformer projects self-attention into a lower-dimensional space and achieves linear-time instead of quadratic resource-requirements. Independent of sequence length! 💪 Watch the video here: https://t.co/ZKw66C2idf @sinongwang @belindazli @MadianKhabsa @Han_Fang_ @facebookai

ykilcher's tweet photo. The Linformer projects self-attention into a lower-dimensional space and achieves linear-time instead of quadratic resource-requirements. Independent of sequence length! 💪 Watch the video here:
https://t.co/ZKw66C2idf
@sinongwang @belindazli @MadianKhabsa @Han_Fang_ @facebookai https://t.co/I92rl1NApT

11

211

39

35

0

Sinong Wang @sinongwang

almost 6 years ago

Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity". We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers. Check our here: https://t.co/yLATBD85lE

sinongwang's tweet photo. Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity".

We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers.

Check our here: https://t.co/yLATBD85lE https://t.co/8MgpWLhTOd

7

339

85

69

0

Sinong Wang @sinongwang

about 8 years ago

@icmlconf When there are slow machines in distributed sparse data computation, how can we mitigate these stragglers to reduce the final job completion time? Our work on Coded Sparse Matrix Multiplication are accepted to @icmlconf. Arxiv version: https://t.co/tKcFcNfgUb

0

5

1

0

Sinong Wang

@sinongwang

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users