Voice AI has evolved rapidly over the last couple of years, and it’s exciting to see so much energy around building voice agents.
In this short course, Voice for AI Agents and Applications, we focus on something very practical: how to integrate voice deeply into real products and give voice to your apps and agents.
For decades, we’ve imagined a future where humans interact with technology more naturally through voice. That future is starting to feel much closer, and my hope is that this course helps more builders take the next step from demos to useful, production-ready experiences.
On a personal note, it was a real honor to record this course with @AndrewYNg . Almost 15 years ago, Andrew’s ML course set me on this path. Like millions of others, it inspired me to pursue AI as a career and made me believe deeply in what machine learning could make possible.
A big thank you as well to @scottcjohnston for joining me in the course for a great conversation on what it takes to ship voice in production.
And thank you to the entire @DeepLearningAI team for all the effort, care, and craft that went into making this course a reality.
I’m also grateful to the many thoughtful people who helped shape and bring this course to life, knowing this list is far from comprehensive: Eli Chen Rakesh Utekar Esmaeil Gargari Brendan Brown Hung-Chieh Huang Karena Cai Ray Banks Alicia Cho _Ariana Faustini Timothy Carmody Joe Chen Nicholas Lewis Natalia Lizarazo Farfán @AI_Fund
Huge thanks and congratulations to Jitesh Gupta Aditi Dhar and the @VocalBridge team!
New course: Add voice to your AI agents and applications, built with @VocalBridge (disclosure: an AI Fund portfolio company) and taught by its CEO @_ashwyn.
Voice applications historically required making a hard tradeoff: using fast voice-to-voice models that sacrifice reliability, or accurate speech-to-text pipelines that add latency. This course teaches you how to build voice agents that are both reliable and fast.
You'll build three types of voice-enabled applications: a voice-interactive game where voice commands and mouse clicks work together over a single channel, an agent that gains a voice in about 10 lines of code without touching its prompts or tools, and an agent that places outbound phone calls using a make_phone_call function.
Skills you'll gain:
- Add a voice layer to an existing agent without rewriting your prompts, RAG pipeline, or tools
- Give an agent the ability to place outbound calls and stream transcripts back live
- Set up voice evaluation to score calls, catch regressions, and improve quality before deployment
Join and add voice to your agents without overhauling your architecture:
https://t.co/gBO4nmaU9u
@_ashwyn is one of the most passionate Voice AI researchers I know. He understands the voice stack end-to-end, has the expertise to push the boundaries of world-class Voice AI research, and has poured that same passion into building this product.
Give it a try.
Day 1 at AI Dev 26 was incredible 🚀
A big highlight was @AndrewYNg launching Codream — the future of learning and education powered by voice: https://t.co/wVm5axdgLe
Huge kudos to the @DeepLearningAI team for shipping Codream 👏
Join us tomorrow at Stage 2 at 1:50 PM as I present @VocalBridge and show how you can give voice to your apps and agents in minutes.
See you there!
The dual-agent architecture was one of the harder problems we have tackled at @VocalBridge , but the moment it clicked was unlike anything else. Getting to crack that together with @AndrewYNg and the AI Fund team made it even more meaningful. Voice UI is the future, and we are just getting started. Here’s to more developers building with voice 🚀
I'm excited about voice as a UI layer for existing visual applications — where speech and screen update together. This goes well beyond voice-only use cases like call center automation.
The barrier has been a hard technical tradeoff: low-latency voice models lack reliability, while agentic pipelines (speech-to-text → LLM → text-to-speech) are intelligent but too slow for conversation. Ashwyn Sharma and team at Vocal Bridge (an AI Fund portfolio company) address this with a dual-agent architecture: a foreground agent for real-time conversation, a background agent for reasoning, guardrails, and tool calls.
I used Vocal Bridge to add voice to a math-quiz app I'd built for my daughter; this took less than an hour with Claude Code. She speaks her answers, the app responds verbally and updates the questions and animations on screen.
Only a tiny fraction of developers have ever built a voice app. If you'd like to try building one, check out Vocal Bridge for free: https://t.co/nGrFznAMLh