Michael Auli @MichaelAuli - Twitter Profile

8 months ago

David AI has raised a $50M Series B from Meritech and NVIDIA to establish the data layer for audio AI. Audio is the front-end interface for real-world AI. At David AI, we’re creating the data that powers the models bringing these use cases to life. https://t.co/mbRg1YwGng

20

146

19

66

171K

Michael Auli @MichaelAuli

almost 2 years ago

We are releasing MMS Zero-shot: a model to transcribe the speech of almost any language using only a small amount of unlabeled text in the new language. Paper: https://t.co/aRoLtZSkpP Demo: https://t.co/tFwhwTZpWO Code/model: https://t.co/IxC8b7A9Ah

6

197

31

149

95K

Michael Auli @MichaelAuli

over 2 years ago

@JeffDean for the Gemini FLEURS results, why didn't you compare to MMS, the SOTA on FLEURS. What are the 62 languages you are evaluating on? Not disclosing them prevents others from comparing to Gemini. Also, the reference for Multilingual Librispeech should be Pratap et al.

0

9

0

516

MichaelAuli retweeted

kyutai @kyutai_labs

over 2 years ago

Our founding team is covering many AI fields from vision, with Patrick Pérez and Hervé Jégou (@hjegou) to LLMs with Edouard Grave (@EXGRV), audio with Neil Zeghidour (@neilzegh) and Alexandre Défossez (@honualx) and infra with Laurent Mazaré (@lmazare).

kyutai_labs's tweet photo. Our founding team is covering many AI fields from vision, with Patrick Pérez and Hervé Jégou (@hjegou) to LLMs with Edouard Grave (@EXGRV), audio with Neil Zeghidour (@neilzegh) and Alexandre Défossez (@honualx) and infra with Laurent Mazaré (@lmazare). https://t.co/03Hp22Fuvl

8

218

34

17

96K

Who to follow

Lilian Weng

@lilianweng

Co-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log

Sasha Rush

@srush_nlp

Researcher at Cursor https://t.co/cZl0wTfqGz

EMNLP 2026

@emnlpmeeting

EMNLP 2026 - The 2026 Conference on Empirical Methods in Natural Language Processing Hashtag: #EMNLP2026 Dates: October 24 –29 Submission: ACL ARR March and May

Michael Auli @MichaelAuli

over 2 years ago

@neilzegh Big congratulations!

0

1

0

118

Michael Auli @MichaelAuli

over 2 years ago

@hjegou Congratulations!

0

1

0

67

Michael Auli @MichaelAuli

over 2 years ago

@EXGRV Congratulations!

0

76

Michael Auli @MichaelAuli

over 2 years ago

Here is an exciting new brain computer interface challenge taking ASR to the next level! Consider participating!

Frank Willett @WillettNeuro

almost 3 years ago

We are publicly releasing all data and code, and are hosting a machine learning competition! Can you do better than us at translating neural activity into text? 2/3 https://t.co/aKrvvtVRv1

5

84

16

15

11K

0

7

0

1

2K

Michael Auli @MichaelAuli

over 2 years ago

The MMS TTS systems are now on HuggingFace! This is TTS for 1,107 languages.

Sanchit Gandhi @sanchitgandhi99

almost 3 years ago

🤗 Transformers just got 1100+ new TTS checkpoints 🚀 You can now run any of @MetaAI's MMS TTS checkpoints using the Transformers library in 3 lines of code⚡️ MMS is a the largest democratizer of TTS globally to date 🌎 Try it in your language now: https://t.co/JcrylGZrKm

sanchitgandhi99's tweet photo. 🤗 Transformers just got 1100+ new TTS checkpoints 🚀

You can now run any of @MetaAI's MMS TTS checkpoints using the Transformers library in 3 lines of code⚡️

MMS is a the largest democratizer of TTS globally to date 🌎

Try it in your language now: https://t.co/JcrylGZrKm https://t.co/CMmT6HCGUc

3

128

34

52

32K

0

12

3

0

2K

Michael Auli @MichaelAuli

almost 3 years ago

Excited to be at ICML 2023 where we will present our work on data2vec 2.0 on Wednesday (talk) and Thursday (poster): https://t.co/N0n2zDKLN7 @alexei_baevski @arunbabu1234 @mhnt1580

MichaelAuli's tweet photo. Excited to be at ICML 2023 where we will present our work on data2vec 2.0 on Wednesday (talk) and Thursday (poster): https://t.co/N0n2zDKLN7
@alexei_baevski @arunbabu1234 @mhnt1580 https://t.co/pTFWYpiZ0Y

1

37

5

12

5K

Michael Auli @MichaelAuli

almost 3 years ago

The MMS models are now available on HuggingFace!

Vaibhav (VB) Srivastav

@reach_vb

almost 3 years ago

Meta AI's recently released "Massively Multilingual Speech" (MMS) model is a huge step forward towards democratizing Speech to every corner of the globe. In addition, it might also play a significant role in preserving global linguistic diversity. https://t.co/5lQZn2nLq6 👇

3

152

38

66

35K

0

13

1

1K

Michael Auli @MichaelAuli

almost 3 years ago

@_josh_meyer_ BibleTTS is great but MMS was inspired by the CMU wilderness project, and we did not build on bibleTTS.

1

0

182

Michael Auli @MichaelAuli

about 3 years ago

@bnjmn_marie This is Table 5. And our analysis does not make any such claims you allege. See section 5.4.

0

1

0

87

Michael Auli @MichaelAuli

about 3 years ago

@bnjmn_marie Whisper results in Table 5 are for completeness and we made it clear that the results are not comparable. But even if somebody ignores this: why would anyone think what you claim in the made up quote below from your blog post? The Whisper error is lower than MMS' in our table.

MichaelAuli's tweet photo. @bnjmn_marie Whisper results in Table 5 are for completeness and we made it clear that the results are not comparable. But even if somebody ignores this: why would anyone think what you claim in the made up quote below from your blog post? The Whisper error is lower than MMS' in our table. https://t.co/on1uPVYrGn

2

1

0

324

Michael Auli @MichaelAuli

about 3 years ago

@bnjmn_marie For the MLS benchmark, the MMS paper makes no claims about MMS compared to Whisper. Since Whisper introduced this normalization, we have a separate comparison just for this in Table 3. There we apply the same normalization and make claims about performance differences.

1

0

140

Michael Auli @MichaelAuli

about 3 years ago

@lifeonmarsspace @ylecun Thanks for flagging - this should be fixed now.

0

53

Michael Auli @MichaelAuli

about 3 years ago

@varshul_cw @ylecun For speech synthesis we trained separate models and for speech recognition and language ID we have one model each.

1

0

70

Michael Auli @MichaelAuli

about 3 years ago

@bnjmn_marie @ylecun Some papers apply additional normalization, we marked those in the results.

0

1

0

127

Michael Auli @MichaelAuli

about 3 years ago

We also find that scaling multilingual ASR to this many languages only results in a very small performance degradation of 0.4 character error rate while increasing the number of supported languages by 18x (61 -> 1,107)

MichaelAuli's tweet photo. We also find that scaling multilingual ASR to this many languages only results in a very small performance degradation of 0.4 character error rate while increasing the number of supported languages by 18x (61 -> 1,107) https://t.co/gU9IDSRczP

2

11

0

1

2K

Michael Auli @MichaelAuli

about 3 years ago

New work! The Massively Multilingual Speech (MMS) project scales speech technology to 1,100-4,000 languages using self-supervised learning with wav2vec 2.0. Paper: https://t.co/C4Uhk4Q4m5 Blog: https://t.co/XXBQFcj086 Code/models: https://t.co/6mOhKPXy1X

14

444

120

148

182K

Michael Auli @MichaelAuli

about 3 years ago

Compared to OpenAI Whisper, the multilingual ASR model supports 11x more languages but has less than half the average error rate on 54 languages of FLEURS. The model is also trained on a fraction of the labeled data (45K vs. 680K hours).

$MichaelAuli's tweet photo. Compared to OpenAI Whisper, the multilingual ASR model supports 11x more languages but has less than half the average error rate on 54 languages of FLEURS. The model is also trained on a fraction of the labeled data (45K vs. 680K hours). https://t.co/W7Zps4QRV6$

1

21

1

3K

Michael Auli

@MichaelAuli

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users