Today, we’re excited to introduce Miso One, the most emotive voice model in the world.
Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.
We’ve open-sourced the model weights, with API access coming soon.
Hear how Miso One sounds in the thread below.
He creat aquest passatemps per posar a prova una cosa: quants dels 947 municipis de Catalunya recordes de memòria? 🗺️
Escriu-ne tants com puguis a contrarellotge, es van pintant al mapa i hi ha rànquing! 🏆
👉 https://t.co/gqYxB0WCTi