Low-latency real-time speech-to-text, text-to-speech and translation APIs.
X video bot:
"@soniox_ai transcribe this"
"@soniox_ai translate this to /language/"
Big moment for Soniox: Today we’re launching Soniox Text-to-Speech.
This is a major step forward for us. Soniox started with speech-to-text. Now, with both STT and TTS, we are becoming the voice platform for every language.
Soniox TTS is built for the hardest parts of speech generation:
- Native-speaker-quality speech in 60+ languages
- Hallucination-free speech generation
- Alphanumerics spoken correctly like numbers, IDs, addresses
- Correct pronunciation for names and foreign words
- Ultra-low-latency streaming for real-time voice applications
And the pricing is simple: $0.70 per hour of generated speech.
What excites us most is the bigger picture: Developers and companies can now work with one provider for the core voice stack: speech-to-text, text-to-speech, multilingual voice, real-time infrastructure, regional deployments, and compliance.
This is a big step in our transition from an STT provider to the voice platform for every language.
Voice is becoming a core interface for software. But to work globally, it has to be fast, accurate, robust, and affordable across every language. That is what we are building at Soniox.
Read the blog post: https://t.co/XeFNOat3IP
@astrange1234 Timestamps are on the roadmap. Multi-speaker is a first for us, but noted. Thanks for using Soniox TTS, more updates coming in the next few weeks.
Happy to see @telnyx added Soniox to their stack powering global communications. 🌎
We handle the hard parts of realtime STT: code-switching, account numbers, names in noisy calls. Try us out with Telnyx voice AI agents.
Voice AI teams now have another STT option on Telnyx.
We just added @soniox_ai STT for real-time transcription workflows.
This matters because STT is one of those pieces you only notice when it gets things wrong.
If the caller switches languages, says a product name, gives an account number, or talks over background noise, the rest of the agent is only as good as the transcript it receives.
With Soniox now available on Telnyx, teams building voice agents get another model to test alongside the rest of their voice stack.
This is useful for multilingual agents, mixed-language calls, and workflows where names, codes, and domain terms matter.
You can read more about it here:
https://t.co/8sAHPjth6G
@LucasVHoutven Hi, that's a cool project. You are most likely referring to "dictation mode". At this point we don't provide that, but it is in the long term roadmap to land in.
Building a voice AI app? One of the hardest parts is knowing exactly when a person has finished speaking.
Wait too long and your assistant feels sluggish. React too early and you cut people off mid-thought.
This is what endpoint detection solves.
How it works: ⬇
Endpoint detection is a small flag that makes a big difference in how natural your voice AI feels.
Full docs and examples here: https://t.co/Ldq6teRgcQ
A clean pattern for voice apps:
Show non-final tokens instantly for live captions. Rerender with final tokens once <end> arrives. Trigger your actions after.
Your UI will feel instant while your logic stays accurate.
Here is the video transcript you requested:
Speaker 1: [English] Of course, I got to show you the most beautiful part, which is video games. It is—it's also the closest to our heart. This is Forza. This is 007, by the way. The new 007 game. I'm looking forward to playing it. I look a little bit like him, ladies and gentlemen, Nvidia's RTX Spark laptops. Now. Thank you. I have too many things in my pocket.[Chinese] 太多东西了。[English] Okay, all right. This is the most amazing chip the world has ever built. This is the N1X that we built in partnership with MediaTek. I think I saw—I saw Rick earlier. This is N1X. This is a beautiful chip. This is—this is a a chip that, frankly, would take 33 years to build. And the reason for that is because 100% of Nvidia's software stack runs here. If you...
Read full transcript here:
https://t.co/AnvuAGXujD
We can't check things that are closed to the public. Our advice is to evaluate the providers yourself and pick the one that works best for what you are trying to build. Benchmarks are and will be gamed or biased, what works in real-world scenarios is a different story.
Thanks for using our models and make sure to follow closely for the next release - it is gonna be a hit.
We are pleased to announce the strategic partnership with @TencentRTC, bringing the world's most accurate speech-to-text natively into @tencentcloud.
Starting today, developers can integrate the Soniox STT API directly within the Tencent RTC console. Build intelligent customer service, voice assistants, real-time translation, and meeting transcription across more than 60 languages, without leaving the Tencent Cloud environment.
We're thrilled to partner with @soniox_ai to elevate enterprise voice AI! 🤝
By combining their advanced Automatic Speech Recognition (ASR) with @TencentRTC , we empower global developers with high-accuracy, low-latency deployments across 200+ countries.
🔥 Native accuracy in 60+ languages
🌍 3,200+ global network nodes
⚡ Sub-300 ms worldwide latency
Dive in: https://t.co/WUukn23DT0