Speechless (insert vocal isolation joke). Thank you @TIME for recognizing AudioShake’s stem separation technology as one of the Best Inventions of 2023. More on why in the year of AI everywhere, our work stood out: https://t.co/jlflNq4Ae3
1. Music detection.
2. Music removal (yes---removing music, including with song lyrics--from a mixed media file)
3. Music rights identification
4. Cue sheet creation
A great, practical example of source separation applied to a real-world problem!
There is so much media & archives blocked from streaming & social platforms bc of copyright compliance. Distribution demand has multiplied, and the compliance workflows underneath it have not.
Today we're launching the system that closes that gap. https://t.co/ThQzEriMf1
CEO @themoko on the distinction that keeps coming up: @AudioShakeAI isn't generative AI — and why that matters more for media + entertainment workflows in 2026 than it did even a year ago.
Today we're sharing our work with ESPN. I wanted to share what we've been building together — and how it involves T-Rex races, competitive sign spinning, and....Phil Simms! 1/7
This work started through the @DisneyAccelerator, where we met the ESPN team. In the name of hot dog eating contests, slippery stairs races, and cornhole championships, onwards! 7/7
"I'm going to Disney World" is almost synonymous with winning the @SuperBowl. @ESPN wanted the original Phil Simms clip for their 2026 ad — but the music was baked in. We pulled his voice from the mix. Watch: https://t.co/4TYHrG6M77
Dialogue RT runs at 11ms end-to-end — the first AI dialogue isolation model to meet live broadcast latency requirements. Built for environments where the audio arrives messy and the pipeline can't wait!
@AudioShakeAI has just launched Dialogue RT — our real-time dialogue isolation model for live broadcast. Audio separation has always been about quality. This is about quality AND speed. A thread on why that matters and what we built. 🧵
Suppression attenuates unwanted sound within a mixed signal. Isolation extracts speech as its own signal, removing everything else. The result is a cleaner stem and independent control over dialogue & the rest of the mix — crowd, PA, ambient sound, etc. No guessing. No bleed.
@Meta new SAM Audio model is awesome—and it's great to see this level of momentum around audio after years of it being treated as a second-class modality.
That said, the benchmarks tell a more nuanced story than the headlines.
SAM is a genuinely exciting step forward, especially for creative exploration and research.
At the same time, the benchmarks reinforce what we see every day in production: when accuracy, determinism, and real-world deployment matter, targeted models still set the standard.