Today, we’re excited to introduce Miso One, the most emotive voice model in the world.
Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.
We’ve open-sourced the model weights, with API access coming soon.
Hear how Miso One sounds in the thread below.
Introducing Ideogram 4.0: the best open image model in the world.
Think it. Make it. Own it.
Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.
@Xxi5olc 自己蒸自己还是不太一样,base 模型能学会某些专业知识,代表这些知识更容易被蒸进 base 模型,强行蒸第三方模型的合成数据,base 模型不一定能学会,可能会超纲。这更像是一种平均的提升所有能力的方法,让 base 模型分别学 abc,以确保 base 同时学会 abc,而不是靠蒸馏第三方走捷径。