MICROSOFT OPEN SOURCED A 7B PARAMETER MODEL THAT TRANSCRIBES 60 MINUTES OF AUDIO IN A SINGLE PASS
and it's completely free
VIBEVOICE ASR no chunking, no context loss, full speaker diarization baked in
not just speech to text..not a basic wrapper
who spoke, when they spoke, exactly what they said..all in one shot
and it handles the hard stuff too..50+ languages, custom hotwords, long form audio that breaks every other tool
the model doesn't know what "context window" means apparently
Available on macOS and Windows right now.
Free to use. Free to fine tune. Free to build on.
it would be so good to benchmark how better/worse models become with systems prompts like that.
I would guess this makes such a little difference but being unable to prove it is so annoying
for the past few days I’ve been using cluely cli with this workflow for auto meeting ingestion.
I will create a blog post soon teaching folks here how to use it
Cluely was known for two things: great marketing and terrible engineering.
the second part was true.
i'm one of the engineers who fixed it. here's everything that was broken and how we rebuilt it:
https://t.co/Jh0AdV5pd0
My dear software engineers, I am excited to present you my latest achievement in the code search area that I've been trying to tackle for the last months:
ACTUALLY WORKING real-time approximate typo-resistant code search. What does it mean?
you: can search any code with any typos
you agent: for every search of UserController with 0 results will automatically suggest UserAuthController without additional cost
It's already live https://t.co/5X6nOmdf5r you can try it right now as MCP for file search