Ulan Abdurazakov

Verified account

@defoemark

Building the fastest, cheapest realtime voice for AI

Oakland, CA

Joined April 2013

122 Following

370 Followers

506 Posts

defoemark retweeted

@realmrfakename

about 1 month ago

Qwen3-ASR-Enhanced v0.1 An early checkpoint, consider it a beta release. Soon I will release a better version with more stable nonverbal tag support. Huge thanks to @huggingface for sponsoring the compute used to train this model! Thanks to LAION for some of the data.

realmrfakename's tweet photo. Qwen3-ASR-Enhanced v0.1

An early checkpoint, consider it a beta release. Soon I will release a better version with more stable nonverbal tag support.

Huge thanks to @huggingface for sponsoring the compute used to train this model!

Thanks to LAION for some of the data. https://t.co/fKiY9PK0hQ

2

54

5

20

4K

Ulan Abdurazakov

3 months ago

@Pendrokar Thanks 👍

0

0

0

0

6

Ulan Abdurazakov

4 months ago

We just open-sourced KaniTTS2 — a 400M param text-to-speech model that runs in 3GB VRAM with voice cloning support. And we’re releasing the full pretrain code so you can train your own TTS from scratch for any language. https://t.co/jfLxIRyOlZ, https://t.co/wcv3L74pbK Apache 2.0

36

1K

173

1K

75K

Ulan Abdurazakov

4 months ago

@ldenoue Got it!

0

2

0

0

13

Who to follow

Work to Live dont Live to Work

Alexandru Bâgu

Senior Software Engineer - Elixir / Phoenix / Typescript / React | Freelancer • Contractor • Remote

Ulan Abdurazakov

4 months ago

For those who uses Voice AI, how important is voice cloning feature for you

1

4

0

1

219

Ulan Abdurazakov

4 months ago

@fahdmirza Yeah we need to work on voice cloning! And that “fuggedaboutit” thing

1

1

0

0

13

Ulan Abdurazakov

4 months ago

We have released 2nd version recently

4 months ago

Meet kani-tts-370m: a multilingual text-to-speech model that's quietly becoming a community favorite. It turns written text into natural-sounding speech across multiple languages. Perfect for devs building accessible apps!

HuggingModels's tweet photo. Meet kani-tts-370m: a multilingual text-to-speech model that's quietly becoming a community favorite. It turns written text into natural-sounding speech across multiple languages. Perfect for devs building accessible apps! https://t.co/qG1DnNwERv

3

70

5

38

5K

2

16

0

7

2K

Ulan Abdurazakov

4 months ago

@Prince_Canuma Ya good point, updated license. Thanks 👍

0

1

0

0

38

Ulan Abdurazakov

4 months ago

Seems like we’ve made the most realistic Voice AI assistant. And completely useless. But he is funny

0

12

2

9

1K

Ulan Abdurazakov

4 months ago

@FabioAngela79 Start with 200 hrs and then up to 1000hrs. It should start talking Italian after 1 epoch of 200hrs

0

1

0

0

41

Ulan Abdurazakov

4 months ago

What if you could build your own TTS model that speaks your language - your accent - for around $200? Then host it locally or in the cloud for a tiny fraction of what any proprietary voice AI service charges. Sound interesting?

1

7

2

2

456

Ulan Abdurazakov

4 months ago

@gabrielstuff France is top 1 in voice AI

0

1

0

0

65

Ulan Abdurazakov

4 months ago

@ysu_ChatData The main tradeoff is that you cannot put a lot of languages into onr model. Better to build seperate models for each language, or group of languages. We will ship streaming and batching example soon

1

1

0

0

298

Ulan Abdurazakov

4 months ago

@kgrchz Yeah mix of synthetic and real world data is the best scenario imho

1

1

0

0

33

Ulan Abdurazakov

4 months ago

@kgrchz Yes. And it will sound exactly like them. Those voices sometimes lack human likeness but generally are more stable

0

0

0

0

11

Ulan Abdurazakov

4 months ago

@dnl23 @rodrimora Actually we wanted to try Greek, even found dataset on HF: https://t.co/1SZBhEMVv9. It needs a seperate model definitely. Just haven't gotten around to it yet

1

1

0

0

35

Ulan Abdurazakov

4 months ago

@kgrchz The main issue - where to find enough data. Decent quality speech recordings

2

0

0

0

160

Ulan Abdurazakov

4 months ago

@TyagiShailesh Thanks!

0

0

0

0

280

Ulan Abdurazakov

4 months ago

@peregil We have an example of how to make dataset. A mix of real world speech recordings: 1-40 sec. You can do up to 1 min too. https://t.co/j3DxP8WTo7

0

1

0

1

253

Ulan Abdurazakov

4 months ago

@untalpablodz Stay tuned. Will be at least 20x cheaper with about the same quality

0

0

0

0

233

Ulan Abdurazakov

4 months ago

@prateekhh Drop me a line. We have only Hessian rn, would love to find more

0

0

0

0

353

Last Seen Users on Sotwe

Trends for you

Most Popular Users