I'm obsessed right now with figuring out the right patterns for LLM subagents.
I think that all software we write is going to have a lot of inference loops running all the time!
A big driver for this is that human-in-the-loop processes must be fast and non-blocking. (Think about voice interfaces, for example.) These human-in-the-loop LLM ... um, loops ... also need to start, stop, steer, and share context with longer running tasks that require external resources, need to use bigger/slower models, etc.
We've been hacking on a bunch of subagent abstraction experiments in Pipecat.
One thing we need is good ways to run subagents both locally and remotely. Enter @vercel sandboxes!
Gradient Bang is our big, open source, multi-player LLM game. It's a canvas for experimenting with subagents, very long contexts over many sessions, dynamic user interfaces, and voice.
@jonptaylor just added "bring your own subagents" support to Gradient Bang, built on Vercel sandboxes.
Here's his video walk-through.
Landing today in Gradient Bang: further combat updates for observed / indirect encounters. Useful when you prefer having corp ships as muscle (keeping your personal ship safe and sound in fed space!)
Updating the map and ship status on the UI was a much needed feature.
✨ Voice AI, open models, and next-generation evals hackathon at @ycombinator in SF on May 30th. ✨
We're co-hosting with @cekuraAi , and we've pulled in our friends at @NVIDIAAIDev, @AWS, and @twilio for expertise and mentoring.
We'll help you build state of the art voice agents using:
- NVIDIA Nemotron models
- AWS SageMaker and Bedrock inference
- Twilio telephony
- Cekura evaluation tooling
- Pipecat orchestration and Pipecat Cloud agent hosting
Up for grabs:
- A guaranteed YC interview
- Special judges' prizes from NVIDIA, AWS, and Twilio for the most impactful and technically impressive projects
Join us to learn from engineers who built all the tools you're using, compare notes with other voice AI developers, and show off your ideas!
Space is limited. Apply below.
Combat Strategies! Spicing up combat in Gradient Bang to give players more control over their space drama.
You can roll with a predefined doctrine, or provide custom prompts for the agent to follow. Perfect for instructing my fleet always be on the lookout for opportunities to take down @kwindla
➡️ Play: https://t.co/6MMIKKWGyq
➡️ Combat sim: https://t.co/gseWcI8haI
Gradient Bang is entirely open source, code here: https://t.co/iS71wFFKIL
It often does that when a port’s prices or stock shift and the route is no longer profitable. It will go off and try to find a new opportunity, keeping its shame quiet until it has money to show for itself first 😅
Lots of prompting and UI work to do to make the market moves visible.
You can signup and try Gradient Bang today! https://t.co/6MMIKKXenY
Pipecat Subagents (also out today) made building this not so scary. Agent handovers are my new goto for historically difficult problems. Lots more to show on that in upcoming features I’m excited about!
Sharing devlogs soon, but for now, code on GH here: https://t.co/iS71wFGiyj
The team at @langchain built voice AI support into their agent debugging and monitoring tool, LangSmith.
LangSmith is built around the concept of "tracing." If you've used OpenTelemetery for application logging, you're already familiar with tracing. If you haven't, think about it like this: a trace is a record of an operation that an application performs.
Here's a very nice video from @_tanushreeeee that walks you through building and debugging a voice agent with full conversation tracing.
Using the LangSmith interface you can find a specific agent session, then dig into what happened during each turn of the conversation. What did the user say and how was that processed by each model you're using in your voice agent? What was the latency for each inference operation? What audio and text was actually sent back to the user?
Today's production voice agents are complex, multi-model, multi-modal, multi-turn systems! Tracing gives you leverage to understand what your agents are doing. This saves time during development. And it's critical in production.
Tanushree shows using a local (on-device) model for transcription, then switching to using the OpenAI speech-to-text model running in the cloud. You can see the difference in accuracy. (Using Pipecat, switching between different models is a single-line code change.)
Also, the video is fun! It's a French tutor. Which is a voice agent I definitely need.
If you're in London on Thursday, head over to AI After Hours and hang out with our friends at @encord_team. Pipecat core contributor @JonPTaylor is on the panel, so there will lots of good discussion about latency, voice AI, and enterprise agent workflows!
Love me a good client UI but oh lawd how I do so love a TUI.
Thanks @aconchillo for rustling up this terminal dashboard for Pipecat. Now I can stare a the web inspector a little bit less each day.
https://t.co/rbl9RkGQTc
Join us for a hackathon at @ycombinator on October 11th.
Gemini x Pipecat realtime AI fun and games!
Build an application using Gemini and Pipecat. See some new APIs. Show off interesting things you're doing in your startup or side project. Hang out with engineers from Google DeepMind and Google Cloud, the AI Tinkerers community, and YC companies Daily, Boundary, Coval, Langfuse, and Tavus. Eat Outta Sight pizza.
Limited space ... apply below.
Heading to Paris next week for the Voice AI Meetup 🇫🇷 with @gladia_io and @trydaily.
I’ll be talking all things @pipecat_ai — from the unexpected origin story of how the framework came to be, to lessons, tips, and tricks for scaling Voice AI agents in production.
Join the Paris realtime AI community on April 2nd for networking, demos, a panel discussion, food, and drinks at Gladia HQ.
Grab a spot: https://t.co/abJ7mAk3Dx
Voice AI Meetup in Paris next week!
🇫🇷 @gladia_io and @JiliJeanlouis are hosting.
🇫🇷 @JonPTaylor, who works on @pipecat_ai, will be there.
Join the Paris realtime AI community on April 2nd for networking, demos, a panel discussion, food, and drinks at Gladia HQ.
Introducing Pipecat Cloud, infrastructure for open source voice AI agents.
If you're building voice AI agents with @pipecat_ai, you have lots of options for hosting your agents: anywhere you can run a Python process and terminate WebSocket or WebRTC connections.
But managing agents in production, on rock solid infrastructure, with observability, autoscaling, blue-green deployments, and everything else needed for real usage at scale is not trivial. Devops is a big category of questions and conversation threads in the Pipecat Discord.
So we built a platform specifically for voice AI.
I've been describing Pipecat Cloud as
➡�� a "Kubernetes wrapper," or
➡️ "Heroku for voice agents, or
➡️ "you just push us a Docker container and we do the rest."
If you're building voice agents with Pipecat, take a look at Pipecat Cloud and tell us what you think.
Open source, native audio turn detection 🎉🎉🎉
Most voice agents today do turn detection by waiting for speech pauses of a specific, short length. That's not how humans do turn detection when we talk to each other!
I've been working with some friends on a new turn detection model. If you're interested in this problem or in learning more about ML engineering, come hack on a small model with us!
More details below.