NVIDIA's Nemotron 3.5 ASR Streaming Multilingual is now available through FluidAudio optimized for Apple Silicon so apps can run ~40-language real-time ASR entirely on device, no cloud required.
Apps shipping with it today include
@SpokenlyApp, @ALTIC_DEV, @SnaplyAI, and others.
FluidAudio:
https://t.co/VWaAPfLRh5
Original model:
https://t.co/87wRTlO5dm
@NVIDIAAIDev
Today's a big day for Nemotron models.
Along with Ultra, we also shipped Nemotron Speech 3.5 that now supports 40 Languages and it's insanely Fast and Ultra Low latency!
I collaborated with @Alex_tra_memory, @fluidinference and @ALTIC_DEV to port the model to coreML to make bring the latest Nemotron model to any macbook using FluidVoice!
Give it a try and lmk what you think!
Link below ⬇️
Audivize now supports NVIDIA Nemotron 3.5 ASR Multilingual via @fluidinference, adding support for 40 language-locales all on-device.
Demo: https://t.co/RwWiEsPBe6
Model: https://t.co/aBzlrWGWkO
@NVIDIAAIDev@NVIDIAAI#NemotronSpeech#VoiceAl
Nemotron ASR Multilingual running on an iPhone 17 Pro in CoreML.
Many thanks to @fluidinference for the CoreML model and to @NVIDIAAI@NVIDIAAIDev for the model itself.
Today's a big day for Nemotron models.
Along with Ultra, we also shipped Nemotron Speech 3.5 that now supports 40 Languages and it's insanely Fast and Ultra Low latency!
I collaborated with @Alex_tra_memory, @fluidinference and @ALTIC_DEV to port the model to coreML to make bring the latest Nemotron model to any macbook using FluidVoice!
Give it a try and lmk what you think!
Link below ⬇️
Supertonic3 running on an iPhone 17 Pro using ANE on CoreML. It’s blazing fast with low RAM consumption and background capable. 2 mins worth of audio generated in 3 secs.
Many thanks to @fluidinference for the port.
rewrote VoiceScribe to use tca/swiftui, now focusing soley on local transcription/llm cleanup.
it now provides local whisper and parakeet (thanks to @fluidinference, coreml) transcription, coupled with optional local llm cleanup (mlx).
there are a lot of wrappers around these great transcription models, but i wanted to learn how to package and distribute local model inference for this use-case myself. it's been a lot of fun.
https://t.co/rqr5CRzX0L
https://t.co/azbE0KNZ3E
Built a YouTube transcriber skill for Claude Code. Paste a URL, get a local transcript. Playlists, auto-chunking for long videos. Also downloads full video with --keep-video.
Powered by @FluidInference — install together with the fluidaudio-skill:
https://t.co/ilHFPmWjuK
FluidAudio now has a Mintlify docs site to help developers integrate local ASR, diarization, and TTS.
Big thanks to @mintlify's OSS program for giving open source projects free access to premium features.
https://t.co/WyiEGYesS9
we now support kyutai-lab's pocket tts in FluidAudio. PocketTTS is a streaming TTS that handles long text with seamless chunk transitions and is a more permissive model for commercial usages .
Get started here:
https://t.co/JK8nROUa60
credits to @sach1n for helping us test this & the video demo
i just finally got qwen-asr converted to cormel. what was surprisingly was that it uses a encoder -> embedder -> llm transcription. the architecture is quite alien from traditional nvidia transformer transducer pipelines.
this is something i am unfamiliar with but it does mean claude code is able to better grasp the LLM architecture given it still follows a standard LLM template. vs requiring alot more guidance for typical audio models.
i have linked the PR if anyone wants to take a look
https://t.co/iXMQ7cYwNA