Qwen3-ASR-Enhanced v0.1
An early checkpoint, consider it a beta release. Soon I will release a better version with more stable nonverbal tag support.
Huge thanks to @huggingface for sponsoring the compute used to train this model!
Thanks to LAION for some of the data.
We just open-sourced KaniTTS2 — a 400M param text-to-speech model that runs in 3GB VRAM with voice cloning support.
And we’re releasing the full pretrain code so you can train your own TTS from scratch for any language.
https://t.co/jfLxIRyOlZ, https://t.co/wcv3L74pbK
Apache 2.0
Meet kani-tts-370m: a multilingual text-to-speech model that's quietly becoming a community favorite. It turns written text into natural-sounding speech across multiple languages. Perfect for devs building accessible apps!
What if you could build your own TTS model that speaks your language - your accent - for around $200? Then host it locally or in the cloud for a tiny fraction of what any proprietary voice AI service charges. Sound interesting?
@ysu_ChatData The main tradeoff is that you cannot put a lot of languages into onr model. Better to build seperate models for each language, or group of languages. We will ship streaming and batching example soon
@dnl23@rodrimora Actually we wanted to try Greek, even found dataset on HF: https://t.co/1SZBhEMVv9. It needs a seperate model definitely. Just haven't gotten around to it yet
@peregil We have an example of how to make dataset. A mix of real world speech recordings: 1-40 sec. You can do up to 1 min too. https://t.co/j3DxP8WTo7