๐Introducing Emilia-Large: 200K+ Hours of Open-Source Speech Data!
Weโre excited to release Emilia-Large, the largest TTS pretraining datasets! With 200K+ hours of multilingual speech data, fully open-source. It is ready to use for #TTS and #SpeechLM.
โจ Whatโs New?
- 2x Scale: Expanded the original Emilia dataset from 101K to 200K+ hours with the new Emilia-YODAS dataset.
- Low-Resource Boost: Enhanced support for languages like German, French, and Japanese.
- Commercial Use: Emilia-YODAS is released under CC-BY
๐Introducing Emilia-Large: 200K+ Hours of Open-Source Speech Data!
Weโre excited to release Emilia-Large, the largest TTS pretraining datasets! With 200K+ hours of multilingual speech data, fully open-source. It is ready to use for #TTS and #SpeechLM.
๐๐๐ MaskGCT!
In addition to the HuggingFace demo: https://t.co/2mCZA9GLzD
you can also join the discord space to play: https://t.co/22IxeaVRq5
Also pre-generated samples: https://t.co/LXiFEiz3Ax
@discord@discordbots@DiscordBotDevs@_akhaliq
Sorry to interrupt but, YES, MaskGCT TTS works for French language !
I have not tested with other latin languages yet, but my guess is that it should work too ๐ค
I've added the MaskGCT TTS @gradio API to the Echo Mimic Space, so you can directly clone your voice before generating portrait generation ๐ค
Try it โโบ https://t.co/vSWUt0lbEL
๐ฅ๐ฅ๐ฅMaskGCT is hot, making Amphion on the list of GitHub Trending again!
> SoTA TTS model
> Zero-shot cloning
> Emotional TTS
> Multilingual, now supporting English and Chinese
> Fully non-autoregressive and duration controllable
Try in HF and https://t.co/FvmcJ5pm6z
Fuck yeah! MaskGCT - New open SoTA Text to Speech model! ๐ฅ
> Zero-shot voice cloning
> Emotional TTS
> Trained on 100K hours of data
> Long form synthesis
> Variable speed synthesis
> Bilingual - Chinese & English
> Available on Hugging Face
Fully non-autoregressive architecture:
> Stage 1: Predicts semantic tokens from text, using tokens extracted from a speech self-supervised learning (SSL) model
> Stage 2: Predicts acoustic tokens conditioned on the semantic tokens.
Synthesised: "Would you guys personally like to have a fake fireplace, an electric one, in your house? Or would you rather have a real fireplace? Let me know down below. Okay everybody, that's all for today's video and I hope you guys learned a bunch of furniture vocabulary!"
TTS scene keeps getting lit! ๐
๐๐๐ A Zero-Shot TTS model MaskGCT (Masked Generative Codec Transformer) is open-sourced in Amphion now. Trained with Emilia. Only needs 5 sec speech to clone
Paper: https://t.co/OdoQ3niCeY
HF: https://t.co/2mCZA9GLzD
Discord: https://t.co/FvmcJ5pm6z
Watch the demo by MaskGCT