Pipecat AI

about 2 months ago

Thank you to our community, and all the Pipecat developers out there. You guys are amazing 🫡

about 2 months ago

Big day today. Pipecat version 1.0. Two years in the making. The most widely used framework for voice agents, but not just voice agents. Pipecat is a framework for realtime, multi-modal, multi-model AI applications. Contributions from NVIDIA, all the foundation labs, AWS, GCP, and Azure. Used by thousands of startups, scale-ups, and enterprises. Pipecat Subagents v0.1.0. A new library for sub-agent orchestration. Which is just a fancy way of saying running lots of inference loops in parallel, with partially shared context. The basic architecture of Pipecat Subagents is an event bus that works locally, and over the network. And Gradient Bang. The side project that broke containment. Built with Pipecat and Pipecat Subagents. Gradient Bang was actually the proving ground for the early Subagents work. But ... it's also a really fun game.

kwindla's tweet photo. Big day today. Pipecat version 1.0. Two years in the making. The most widely used framework for voice agents, but not just voice agents. Pipecat is a framework for realtime, multi-modal, multi-model AI applications. Contributions from NVIDIA, all the foundation labs, AWS, GCP, and Azure. Used by thousands of startups, scale-ups, and enterprises.

Pipecat Subagents v0.1.0. A new library for sub-agent orchestration. Which is just a fancy way of saying running lots of inference loops in parallel, with partially shared context. The basic architecture of Pipecat Subagents is an event bus that works locally, and over the network.

And Gradient Bang. The side project that broke containment. Built with Pipecat and Pipecat Subagents. Gradient Bang was actually the proving ground for the early Subagents work. But ... it's also a really fun game.

14

265

22

217

26K

1

11

3

4

2K

pipecat_ai retweeted

3 days ago

Microsoft announced a bunch of interesting new AI models and tools this week. Model launches alway get lots of attention. But don't sleep on the new ASSERT evals framework that launched today. I'm on record as arguing that 2026 is the year of evals. Evals are the glue for all the "jobs to be done" at every level of AI: model training; testing and deciding on what models to use and how to use them; and testing and improving AI agents in production. Evals unify our work on those different layers of the stack. These days, when we talk about evals, observability, and testing, we're talking about overlapping parts of a large set of tools we're still early on in figuring out. As the AI engineering ecosystem matures, diversifies, and increases massively in scale, we really, really need good evaluation (observability, monitoring, testing, data management) frameworks. I got a chance to test the new Microsoft ASSERT evals framework before it was released, and it has some very nice core ideas. 1) ASSERT is open in two important ways. First, the team is serious about broad support for models, frameworks, and use cases. Microsoft spent time understanding voice agent use cases and building Pipecat support, for example. Second, the code is completely open source, released under an open MIT license. 2) We're all working in and with agentic coding tools today. That means we are planning in natural language, and all of our software development and ops tools have to evolve for these new, natural language, workflows. ASSERT takes descriptions of desired agent behavior and generates specifications for the ASSERT suite of tools to run against. In a world where "English is the programming language," how we actually make natural language "code" precise enough and repeatable enough is perhaps the big unsolved tooling problem that all of us are working towards in different ways. This is true whether we work on coding agents, AI opps tooling, orchestration frameworks, or vertical applications. 3) Microsoft describes ASSERT as a policy-driven framework. Rather than eval against generic performance metrics, ASSERT aims to generate stable but adaptable evaluation criteria for specific agents. "Policy-driven" also implies a full loop design. Policy (generated from specific requirements) -> evaluation -> optimization -> monitoring in production -> improving the policy description -> evaluation -> ... 4) Enterprise agents need to be evaluated along many dimensions: task completion, individual conversation turn behavior, latency, mode-specific metrics like audio disfluencies, and safety/security. Microsoft designed ASSERT to be used together with a new safety governance toolkit called Agent Control Specification. 5) Finally, ASSERT is integrated into the Microsoft Foundry ecosystem. Today, AI engineering tools have to be open source and vendor neutral to get attention from developers and gain widespread adoption. *And* it's equally important to give enterprise customers tools that work as a coherent stack. This is hard to do well. There are real tensions between open source development versus engineering a great full stack developer experience. However, if you sweat the details on both ends, you benefit from a full spectrum of feedback about real-world development pain points. It's more work, but it's worth it! Kudos to Microsoft for embracing this and committing to an open, community oriented approach, plus doing the extra work to build the full stack for enterprise customers.

kwindla's tweet photo. Microsoft announced a bunch of interesting new AI models and tools this week. Model launches alway get lots of attention. But don't sleep on the new ASSERT evals framework that launched today.

I'm on record as arguing that 2026 is the year of evals.

Evals are the glue for all the "jobs to be done" at every level of AI: model training; testing and deciding on what models to use and how to use them; and testing and improving AI agents in production.

Evals unify our work on those different layers of the stack.

These days, when we talk about evals, observability, and testing, we're talking about overlapping parts of a large set of tools we're still early on in figuring out.

As the AI engineering ecosystem matures, diversifies, and increases massively in scale, we really, really need good evaluation (observability, monitoring, testing, data management) frameworks.

I got a chance to test the new Microsoft ASSERT evals framework before it was released, and it has some very nice core ideas.

1) ASSERT is open in two important ways. First, the team is serious about broad support for models, frameworks, and use cases. Microsoft spent time understanding voice agent use cases and building Pipecat support, for example. Second, the code is completely open source, released under an open MIT license.

2) We're all working in and with agentic coding tools today. That means we are planning in natural language, and all of our software development and ops tools have to evolve for these new, natural language, workflows. ASSERT takes descriptions of desired agent behavior and generates specifications for the ASSERT suite of tools to run against.

In a world where "English is the programming language," how we actually make natural language "code" precise enough and repeatable enough is perhaps the big unsolved tooling problem that all of us are working towards in different ways. This is true whether we work on coding agents, AI opps tooling, orchestration frameworks, or vertical applications.

3) Microsoft describes ASSERT as a policy-driven framework. Rather than eval against generic performance metrics, ASSERT aims to generate stable but adaptable evaluation criteria for specific agents.

"Policy-driven" also implies a full loop design. Policy (generated from specific requirements) -> evaluation -> optimization -> monitoring in production -> improving the policy description -> evaluation -> ...

4) Enterprise agents need to be evaluated along many dimensions: task completion, individual conversation turn behavior, latency, mode-specific metrics like audio disfluencies, and safety/security. Microsoft designed ASSERT to be used together with a new safety governance toolkit called Agent Control Specification.

5) Finally, ASSERT is integrated into the Microsoft Foundry ecosystem. Today, AI engineering tools have to be open source and vendor neutral to get attention from developers and gain widespread adoption. *And* it's equally important to give enterprise customers tools that work as a coherent stack.

This is hard to do well. There are real tensions between open source development versus engineering a great full stack developer experience. However, if you sweat the details on both ends, you benefit from a full spectrum of feedback about real-world development pain points. It's more work, but it's worth it!

Kudos to Microsoft for embracing this and committing to an open, community oriented approach, plus doing the extra work to build the full stack for enterprise customers.

10

52

10

31

4K

pipecat_ai retweeted

30 days ago

SF Voice AI Meetup livestream. Presentations and panels start at 7:15 Pacific. https://t.co/FSn5CX6i0k

1

36

6

16

4K

pipecat_ai retweeted

about 1 month ago

Local native-audio voice agent running on an RTX 5090. - @NVIDIAAI Nemotron 3 Nano - audio|text ➡️ text - patched vLLM to implement complete turn prefix caching - ~125ms TTFT - @kyutai_labs Pocket TTS - text ➡️ audio - Nemotron Speech ASR - streaming audio ➡️ text - @pipecat_ai Smart Turn end-of-utterance - ~500ms total voice-to-voice latency - runs bash via tool calls If you're interested in voice and realtime multi-modal AI, come join us at the SF Voice AI Meetup on Thursday May 7th. Talk to engineers from NVIDIA, Kyutai, and Pipecat about what you're building! Links to meetup registration, code, and models on @huggingface below ...

9

133

19

117

8K

pipecat_ai retweeted

about 1 month ago

Voice AI Meetup, Thursday May 7th. This one's a special crossover event. T-Bot, who hosts the global Voice AI Spaces meetups, is visiting San Francisco and will MC! - NVIDIA researchers will present some of their really cool recent work on speech models. - We'll have demos and two fireside chats, featuring new developments in models and evals, with @GradiumAI, @ArtificialAnlys, @ServiceNow, and @pipecat_ai. - And, of course, 🍕 and great conversation. - Thanks to the @trychroma team for hosting in their wonderful office/event space. Registration link below. Come hang out with 150 old and new friends!

kwindla's tweet photo. Voice AI Meetup, Thursday May 7th. This one's a special crossover event. T-Bot, who hosts the global Voice AI Spaces meetups, is visiting San Francisco and will MC!

- NVIDIA researchers will present some of their really cool recent work on speech models.
- We'll have demos and two fireside chats, featuring new developments in models and evals, with @GradiumAI, @ArtificialAnlys, @ServiceNow, and @pipecat_ai.
- And, of course, 🍕 and great conversation.
- Thanks to the @trychroma team for hosting in their wonderful office/event space.

Registration link below. Come hang out with 150 old and new friends!

2

39

7

12

7K

about 1 month ago

Excited to support the new @DeepgramAI Flux Multilingual model

Deepgram

@DeepgramAI

about 1 month ago

Flux Multilingual is live. Real-time conversational speech-to-text for voice agents in 10 languages, with monolingual-grade accuracy, turn detection, and code-switching. Deploy once and launch globally. Learn more → https://t.co/lhhUZ2maTo

3

40

13

31

15K

0

11

3

8

1K

pipecat_ai retweeted

smallest.ai

@smallest_AI

about 1 month ago

Smallest AI is now natively supported in @pipecat_ai Lightning TTS + Pulse STT can now plug directly into your Pipecat voice agent pipeline. Docs below ⬇️

8

100

20

36

377K

pipecat_ai retweeted

Tarush Agarwal

@tarush_agarwal_

about 1 month ago

We just made Pipecat testing a lot easier. With @cekuraAi + @pipecat_ai , you can now get: • full traces • every tool call with inputs + outputs • complete transcripts with timestamps • mock tools so agents don’t hit live APIs • chat + WebRTC testing, all in one place Everything in one place for both test runs and production debugging. Docs below 👇

3

20

5

21

6K

about 2 months ago

🫡 🔥 🔥

Adam \[._.]/

@antic

about 2 months ago

I did it. I beat gradient-bang. First person to discover the full map. The only record that can’t be broken. Thanks for all the fun @pipecat_ai @kwindla @chadbailey59 etc. I wasn’t sold on voice agents until I found this game and the experience is actually really great.

antic's tweet photo. I did it. I beat gradient-bang. First person to discover the full map. The only record that can’t be broken. Thanks for all the fun @pipecat_ai @kwindla @chadbailey59 etc. I wasn’t sold on voice agents until I found this game and the experience is actually really great. https://t.co/ANG77pkyWC

2

11

0

1

3K

0

2

1

0

895

pipecat_ai retweeted

about 2 months ago

Thank you to our community, and all the Pipecat developers out there. You guys are amazing 🫡

1

11

3

4

2K

pipecat_ai retweeted

about 2 months ago

Sub-agents in (latent) space! We’ve been working on a side project. As far as I know, this is the first massively multiplayer, completely LLM-driven game. Come play Gradient Bang with us. See if you can catch me on the leaderboard. This whole thing started because I wanted to explore a bunch of things I’m currently obsessed with, in an application of non-trivial size, that felt both new and old at the same time. So … a retro-style space trading game built entirely around interacting with and managing multiple LLMs. Factorio, but instead of clicking, you cajole your ship AI into tasking other AIs to do things for you. Some of the things we’ve been thinking about as we hack on Gradient Bang: - Sub-agent orchestration - Partial context sharing between multiple LLM inference loops - Managing very long contexts, and episodic memory across user sessions - World events and large volumes of structured data input as part of human/agent conversations - Dynamic user interfaces, driven/created on the fly by LLMs - And, of course, voice as primary input If you’ve been building coding harnesses, or writing Open Claw agents, or doing pretty much anything that pushes the boundaries of AI-native development these days, you’re probably thinking about these things too! This is all built with @pipecat_ai, the back end is @supabase, the React front end is deployed to @vercel, and all the code is open source.

139

3K

264

3K

453K

pipecat_ai retweeted

3 months ago

GTC! Head on over to AWS Booth 921, Kiosk 3. Check out building realtime AI with @NVIDIAAI Nemotron, @awscloud and @pipecat_ai

0

2

0

2K

3 months ago

@marcuswquinn @ShaneMac 🔥

0

1

0

21

pipecat_ai retweeted

3 months ago

Today's @NVIDIA Nemotron 3 Super launch is an exciting development for voice AI developers. We’re proud to be a launch partner, with day-0 @pipecat_ai support. Developers now have a meaningful open stack for realtime voice, with @NVIDIAAI — Nemotron 3 Nano, Nemotron Speech ASR, Nemotron 3 Super. Open models, open training data. Review how Nemotron 3 Super matches proprietary models in our long-conversation voice agent benchmarks. Happy building, with open source!!

trydaily's tweet photo. Today's @NVIDIA Nemotron 3 Super launch is an exciting development for voice AI developers.

We’re proud to be a launch partner, with day-0 @pipecat_ai support. Developers now have a meaningful open stack for realtime voice, with @NVIDIAAI — Nemotron 3 Nano, Nemotron Speech ASR, Nemotron 3 Super. Open models, open training data.

Review how Nemotron 3 Super matches proprietary models in our long-conversation voice agent benchmarks. Happy building, with open source!!

2

21

2

13

2K

pipecat_ai retweeted

3 months ago

NVIDIA Nemotron 3 Super launches today! We've been building voice agents with Super's pre-release checkpoints and running all our various tests and benchmarks. Nemotron 3 Super matches both GPT-5.4 and GPT-4.1 in tool calling and instruction following performance on our realtime conversation, long context, real-world benchmarks. GPT-4.1 is the most widely used LLM today for production voice agents. So an open model that performs as well as GPT-4.1 on hard, voice-specific benchmarks is a big deal. (Side note: we don't think a benchmark "tells the story" about a model's voice agent performance unless it tests model correctness across at least 20 human/agent conversation turns.) The Nemotron models are *fully* open: weights, data sets, training code, inference code. Nemotron 3 Super is 120B params, with a hybrid Mamba-Transformer MoE architecture for efficient inference. You can run it on NVIDIA data center hardware or on a DGX Spark mini-desktop machine. 1M token context. Blog post with full benchmarks, thinking budget notes, inference setup on @Modal, and where we think this goes next. 👇

kwindla's tweet photo. NVIDIA Nemotron 3 Super launches today! We've been building voice agents with Super's pre-release checkpoints and running all our various tests and benchmarks.

Nemotron 3 Super matches both GPT-5.4 and GPT-4.1 in tool calling and instruction following performance on our realtime conversation, long context, real-world benchmarks. GPT-4.1 is the most widely used LLM today for production voice agents. So an open model that performs as well as GPT-4.1 on hard, voice-specific benchmarks is a big deal.

(Side note: we don't think a benchmark "tells the story" about a model's voice agent performance unless it tests model correctness across at least 20 human/agent conversation turns.)

The Nemotron models are *fully* open: weights, data sets, training code, inference code.

Nemotron 3 Super is 120B params, with a hybrid Mamba-Transformer MoE architecture for efficient inference. You can run it on NVIDIA data center hardware or on a DGX Spark mini-desktop machine.

1M token context.

Blog post with full benchmarks, thinking budget notes, inference setup on @Modal, and where we think this goes next. 👇

14

236

34

108

20K

3 months ago

https://t.co/VDVdnindfC

3 months ago

11a PT - @NVIDIAAI Nemotron Labs livestream starts in a couple minutes! How to build voice agents with @nvidia Nemotron open models, using @modal @pipecat_ai @trydaily @kwindla

1

2

0

1K

0

1

834

3 months ago

🔥 Streaming now, @NVIDIAAI Nemotron Labs session. How to build voice agents on @nvidia open modals, using @modal @trydaily @pipecat_ai cc @kwindla

1

6

1

2

842

3 months ago

New STT realtime release: @AssemblyAI Universal-3-Pro, supported in Pipecat

AssemblyAI

@AssemblyAI

3 months ago

Real-time transcription just got a significant upgrade. Universal-3-Pro is now available for streaming — bringing AssemblyAI's most accurate speech model to live audio for the first time. Developers building voice agents, live captioning tools, and real-time analytics pipelines now get three things they've been asking for: 🔹 Best-in-class word error and entity detection across streaming ASR benchmarks 🔹 Real-time speaker labels — know who said what, as it happens 🔹 Superior entity detection for names, places, orgs, and specialized terminology in real-time 🔹 Code-switching and global language coverage built-in

8

59

7

37

15K

0

23

4

15

3K

pipecat_ai retweeted

3 months ago

Join @NVIDIAAI for a Nemotron Labs livestream on building voice agents with open models. Tuesday, 3/3, link in thread, @modal @pipecat_ai @nvidia @kwindla Ben Shababo

1

7

2

1

1K

3 months ago

Register, https://t.co/JM5Lqlaxu0

2

0

510