1/6 Paid $200 for @claudeai Max(20x). Went off-grid for a Holiday/Trail run race . Came back Monday with 95% of my weekly quota unused. Reset was Wed 11pm. So I tried to actually use maximize what I paid for. Bad move.
@claudeai@AnthropicAI 5/6 I get protecting against abuse. Punishing a paying customer for using unused quota in the remaining days before reset? That's not moderation. That's a scam.
I moved to 800ms. Matches Pipecat's default VAD stop_secs. Bug went away. Is 800ms "right"? It's a barge-in tradeoff. Shorter = more responsive, longer = safer. Pick from acoustic reality, not intuition.
Two more production details: disarm the gate when new TTS arrives so barge-in on the next turn isn't blocked. And on iOS 18+, call setVoiceProcessingEnabled(false) BEFORE engine.stop() or you can crash in AURemoteIO teardown.
My first gate was 300ms. It leaked. The post-playback echo window is hardware latency (10-60ms) + room reverb tail (100-300ms per WebRTC AEC3) + VPIO convergence. 300ms covers neither end comfortably.
Voice-agent devs on iOS — if you turned on setVoiceProcessingEnabled(true) and your agent still hears its own TTS, you are not alone. I just spent two weeks in that hole building Uttero. Here's the field report.
Even with all three fixed, you will still hear residual TTS tail in quiet rooms. Fix: application-layer mic gate. When the playback buffer empties, suppress the mic for a grace window so the speaker flush + VPIO residual can't re-enter as "user speech."
Killer #3: manual rendering mode. setVoiceProcessingEnabled silently no-ops. If you need custom mixing, drop to CoreAudio + kAudioUnitSubType_VoiceProcessingIO directly. Twilio's ExampleAVAudioEngineDevice.m is the canonical reference.
Killer #2: session mode. .default + .defaultToSpeaker is intuitive for hands-free UX and disables AEC. Use .voiceChat. This is the mode VPIO was tuned for.
Killer #1: init order. Attach your playback graph BEFORE calling setVoiceProcessingEnabled(true). Not after. VPIO's two-bus architecture needs the output bus connected to establish the AEC reference. Reverse order silently breaks it.
Before you even get to residual echo, there are three silent killers — config mistakes where VPIO looks active but is doing nothing. Most iOS voice tutorials fall into at least one.
In a noisy room, subtraction works fine. In a quiet room, the residual at the end of a TTS utterance is clearly audible. Your agent hears it, transcribes it, responds to itself. Two turns later you are in a feedback loop.
Apple's VoiceProcessingIO is not traditional AEC. Per an Apple DTS engineer on Dev Forums thread 97679: VPIO is output *subtraction*, not adaptive cancellation. No delay parameters. No tuning surface. No way to inspect the reference path.
#Earthquake (#gempa) possibly felt 23 sec ago in #Indonesia. Felt it? Tell us via:
📱https://t.co/QMSpuj6Z2H
🌐https://t.co/AXvOM7I4Th
🖥https://t.co/wPtMW5ND1t
⚠ Automatic crowdsourced detection, not seismically verified yet. More info soon!