We somehow got put in the spotlight the last few days! First we'd like to thank the organizers of the AI show for that, we can't get enough of this stuff. I'll say a few things about where we are and what we do.
Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser.
This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from @MistralAI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency.
Fully private. Zero cost.
What if AI could see the world the way we do?
That’s the idea we bet our weekend on at the Mistral Worldwide Hackathon.
With @haaspierre_ and Arman Artola-Zanganeh, we built 𝗣𝗼𝗿𝘁:𝗪𝗼𝗿𝗹𝗱🌍, an open-source framework that lets anyone connect their Meta glasses to any AI system.
Let me take you back to saturday morning.
So before knowing it could work we needed the hardware.
So I ran to Rue de Rivoli and bought €500 Meta glasses on the spot.
If that’s not commitment, I don’t know what is (a true bet).
We then built non-stop for 36 hours to make it usable. End-to-end.
The glasses stream what you see → the AI makes sense of it → it answers back through the glasses’ speaker.
And suddenly when we understood that it was going to work, the question changed.
It was no longer “𝗜𝘀 𝘁𝗵𝗶𝘀 𝗱𝗼𝗮𝗯𝗹𝗲?”
It became “𝗪𝗵𝗮𝘁 𝗰𝗮𝗻 𝗽𝗲𝗼𝗽𝗹𝗲 𝗯𝘂𝗶𝗹𝗱 𝘄𝗶𝘁𝗵 𝘁𝗵𝗶𝘀?”
- A plumber getting live assistance while repairing something.
- A technician repairing industrial machinery.
- A traveler exploring a new country.
- A visually impaired person navigating space.
At first, we were looking for the “right” use case.
Then we realized something more interesting.
If AI can share your perspective, continuously, the use cases are not ours to decide.
That’s why 𝗣𝗼𝗿𝘁:𝗪𝗼𝗿𝗹𝗱🌍 is fully open source.
If you want to connect your Meta glasses, plug in your own models, customize with your own prompts, your own MCP, your Openclaw… you can.
Link to the open source repo (you can contribute and give it a little star ❤️): https://t.co/UueLnkMZpM
Link to the demo video: https://t.co/qcTDqKGvax
Huge thanks to the organizing team of the hackathon, it was truly great. @Jthmas404
Great app: Voxtral-Subtitles to transcribe any video with word-level subtitles, speaker diarization & multilingual translation.
⬇️ Try in now on Hugging Face Spaces
Voxtral can now directly stream audio input into text output. Perfect for:
- Live subtitles
- Language learning apps
- Note-taking tools
- And more!
Made a demo for you to try directly on hugging face !
Automate or simplify your workflow for subtitle creation with Voxtral Transcribe 2.0! From word-level timestamps to diarization and context biasing, take it further by leveraging our other general-purpose models to translate subtitles.
@drivelinekyle@simonw Sorry it didn’t work! The demo was set up for ~50 concurrent users (backed by a VLLM endpoint), but we got thousands. Switched to our API for now and will add a local option soon when the model is supported in transformers.js! Give it another try!
NEW: @MistralAI releases Mistral 3, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗
Surprisingly, the 3B is small enough to run 100% locally in your browser on WebGPU! 🤯
Magistral from @MistralAI is a model trained to do reasoning on images.
Here is a fun challenge :
Can you bet Magistral 1.2 at Geoguessr ? 🌎❓
To test that, I recreated an MVP on @huggingface
🔗 below
Mistral AI is sponsoring the @huggingface 𝗚𝗿𝗮𝗱𝗶𝗼 𝗔𝗴𝗲𝗻𝘁𝘀 & 𝗠𝗖𝗣 𝗛𝗮𝗰𝗸𝗮𝘁𝗵𝗼𝗻 𝟮𝟬𝟮𝟱
Take this opportunity to explore our new 𝗠𝗖𝗣 𝗰𝗼𝗺𝗽𝗮𝘁𝗶𝗯𝗹𝗲 𝗔𝗴𝗲𝗻𝘁𝘀 𝗔𝗣𝗜.
Join me for the 𝗸𝗶𝗰𝗸𝗼𝗳𝗳 𝗹𝗶𝘃𝗲 tomorrow (June 3rd)!
Details in thread