🚨 Microsoft just open sourced a voice AI that was too dangerous to keep live.
They took it down. Added watermarks and safety controls. Then re-released it. For free.
It's called VibeVoice.
Microsoft's frontier open source voice AI.
Clone any voice from 10 seconds of audio. Generate 90 minutes of multi-speaker conversation. Real-time streaming. All running locally on your machine.
No ElevenLabs. No $99/month subscription. No per-minute pricing.
Here's what this thing does:
→ Text-to-speech that sounds indistinguishable from a real human
→ Generate up to 90 minutes of audio in a single pass
→ 4 distinct speakers in one conversation with natural turn-taking
→ Clone any voice from just 10 seconds of audio
→ Real-time streaming TTS. First audio in ~200 milliseconds.
→ Speech-to-text that processes 60 minutes of audio in one pass
→ Identifies who said what and when. Speaker labels + timestamps.
→ Supports 50+ languages for transcription
→ Custom hotwords for names, technical terms, domain-specific accuracy
Here's the wildest part:
Give it a podcast script. It generates a full multi-speaker conversation that sounds like two real humans talking. Natural pauses. Emotional nuance. Turn-taking. 90 minutes. One command.
Microsoft had to take this repo down once because people were misusing it for deepfakes and disinformation. They brought it back with embedded watermarks, audio disclaimers, and safety controls.
That's how powerful this is. A $3 trillion company built it. Released it. Pulled it. Fixed it. And gave it back to the world.
ElevenLabs: $99/month.
https://t.co/fOJ1qDfCPb: $39/month.
Amazon Polly: pay per character.
This: Free. Local. MIT License.
23.5K GitHub stars. 2.6K forks. Backed by Microsoft Research.
100% Open Source.