Extremely proud of the team @cartesia for launching Sonic 3.5, which sets a new state of the art for TTS
I personally led the technical direction of this model; we built it ground up from first principles, and it contains multiple non-trivial ideas that differ substantially from anything we’ve seen in the literature. It’s been very gratifying to see research bets play out and the strong research team at Cartesia continue to grow!
btw i wasn't kidding last week
we're hiring core researchers to work on foundational problems in memory, continual learning, real-time multimodal models, and more @cartesia come work with us!
BREAKING: AI just killed slow TTS models.
Forget seconds of delay, robotic tone, and broken accents.
Sub-100 ms latency → natural conversation that responds like a human.
Here’s how Cartesia’s Sonic-3 hit real-time speed with emotion: 👇
A Stanford researcher raised $100M from NVIDIA and built voice AI that impressed Elon Musk.
Cartesia Sonic 3:
-5x faster than OpenAI
-sub-190ms response
-42 languages with native accents
Here's the demo + 100K free credits👇
Elon Musk was impressed by this voice AI.
After testing it myself, I get why.
It responds in 40ms, laughs naturally, and sounds so human it pranked someone for 5 minutes.
Cartesia's Sonic 3 just made ElevenLabs look outdated. Here's how:👇
I tested every major voice AI on the market.
Only one could nail native accents, laugh naturally, AND respond fast enough for real-time calls.
That's Cartesia's Sonic 3: backed by $100M from NVIDIA.
Here's what makes it different (100K free credits):👇
Nothing beats a team of 2 Indians and 2 Chinese founders.
Just like Indo-Chinese food, they took the best parts and made something better.
Cartesia's Sonic 3:
-5x faster than OpenAI (<100ms)
-42 languages with native accents
-Backed by $100M from NVIDIA
Watch it in action:👇
A 50-person startup just built voice AI 3-5x faster than OpenAI with 1/10th the team size.
Cartesia's Sonic 3 can replace customer support calls at the enterprise scale.
ServiceNow already switched.
Here's what you need to know (100K free credits):👇
> be Karan Goel
> spawn in Delhi, India
> see the entire family hustle
> decide to build something new
> graduate with a dual degree from IIT
> rank top 0.1% of India
> realize it's not enough
> head to Carnegie Mellon to build AI
2017 - 2018
> win $35K in Siebel Scholarship
> grind like crazy at CMU
> move to Stanford for PhD
> join Stanford AI Lab under Prof. Chris Ré
> dive deep into the algorithms
> start building in AI from scratch
> research at Salesforce AI Research
> get Greylock X Fellowship
> learn AI infrastructure from every angle
2020
> invent State Space Models (SSMs)
> publish "It's Raw! Audio Generation with State-Space Models" at ICML 2022
> cook while everyone watches
2023
> finish PhD thesis
> could join any lab, any company
instead go "we're starting a company"
> start building Cartesia to disrupt Voice AI
> raise $22M seed from top VCs and angels
2024
> launch Sonic: world's fastest TTS model
> 10,000+ customers sign up
> Quora, Cresta, Rasa, etc start using it
2025
> raise $64M Series A led by Kleiner Perkins
> go from founding to Series A in 18 months
> launch Sonic 2.0
> cut latency in half: 90ms → 45ms
> scale to 50,000+ customers
> become the fastest voice AI in the world
August 2025
> launch Line
> voice agent development platform
> make it easy for anyone to build voice AI
> state space models go mainstream
Oct 2025
> launch Sonic-3 with 42 languages
> raise $100M to accelerate
Karan Goel for you, ladies and gentlemen.
He is living proof that you are not born a legend, but you can become one.
What's stopping you?
Holy Sh*t...
AI voices just hit a new level.
Cartesia’s new Sonic-3 doesn’t sound robotic - it laughs, switches languages, and even changes tone when you say “talk slower.”
This is what the future of voice AI sounds like: 🧵
🚨 BREAKING: Voice AI that's 3-5x better than OpenAI just launched and sounds actually human.
Cartesia Sonic-3 talks, laughs, and reacts just like a real person.
It speaks 42 languages and replies instantly, faster than you can notice.
Here’s what it does: