Everyone keeps trying to force LLMs to do voice understanding.
it's like printing out an email to read it.
technically works. completely misses the point.
this startup took a different bet entirely.
@modulate_ai spent years training Velma on 550M hours of raw audio - no transcript step, no LLM in the middle.
just layers of raw signal.
hesitation, sarcasm, vocal stress, emotion, intent. 150+ behaviors detected in real time.
it's already the engine running inside Call of Duty and GTA Online.
and it just quietly ended up #1 on the conversation understanding benchmark.
above GPT-5.
above @Gemini.
above @Grok.
at 10x lower cost.
a small team just rewrote the rules of voice AI. barely anyone's noticed yet.
the API just opened. and they're givng away 1000 free credits: https://t.co/nE6T9cBYLA
@OpenAI Exciting announcement. However, real-time transcription at $0.017 per minute is not competitive. @modulate_ai offers real-time streaming transcription at $0.06 per hour or 17x lower cost than OpenAI
@nkerzman@VaibhavSisinty@modulate_ai Yes, 100% @modulate_ai's Speech-to-text has all the technical requirements of every other transcription API out there, but its much cheaper. Then, once you have the transcribed text, you'd use a TTS API
@_thinx@VaibhavSisinty@modulate_ai Hi @_thinx We'll have the ability to support 5k+ concurrent connections for STT in ~1-2 weeks. DM me and I'll send you more details! We have a few customers running at 1K concurrent connections now and increasing it to 5k+ scaling very soon!
@xai didn't kill the whole voice AI industry - just kicked the corpses of the old guard. @modulate_ai prices STT at $0.03/hr (way cheaper!), has full support for emotion + deepfake detection and 50+ languages, and leads on WER for real-world audio. Validate it for yourself with >300hrs free - https://t.co/tcpkKVMz2G