Introducing the newest Coral board, for efficient, on-device AI!
Check out the demos in the video:
- On-board speech translation
- Natural language controlling hardware
- Vision & sound generating music
For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.
We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal.
This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents.
Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold.
Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.
1/ Audio is now first-class on OpenRouter.
Two new endpoints live today:
📢 /api/v1/audio/speech — text-to-speech (TTS)
🎤 /api/v1/audio/transcriptions — speech-to-text (SST)
Same routing, billing, and keys you already use for text, image, and video.
Big announcement for speech AI
Benchmarks get gamed. So we added a repellent.
The Open ASR Leaderboard now includes private evaluation data from Appen and DataoceanAI, making speech recognition benchmarks more robust against test-set contamination and “benchmaxxing.”
Better signal. Less overfitting. More real-world ASR.
@acarroll_ATG Great to see AI systems being used to make meaningful improvements! Do you plan on writing up more of the details? What worked/failed etc?
Just fyi in the post: "but did not fully leverage all available signals. not fully leverage all available signal."
27x faster Attention Residuals!!! 🚀
We implemented Block AttnRes as a pip-installable package.
!pip install flash-attn-res
No annoying kernel nonsense.
No compile/autograd plumbing.
Call it like a regular PyTorch op.
It just works.
Methodology:
🔹 fused triton kernels
🔹 batched attention over residual blocks
🔹 online-softmax merge
🔹 flash attention-style split-KV reduction
Thanks @LLMenjoyer and @cartesia for the support and guidance✌️
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.
Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today!
📄 Tech Report: https://t.co/drlDrxkYtp
🤗 Open Weights: https://t.co/T13Y8i7SDM
1/n
1. We believe in iterative deployment; although GPT-5.5 is already a smart model, we expect rapid improvements. Iterative deployment is a big part of our safety strategy; we believe the world will be best equipped to win at the team sport of AI resilience this way.
2. We believe in democratization. We want people to be able to use lots of AI; we aim to have the most efficient models, the most efficient inference stack, and the most compute. We want our users to have access to the best technology and for everyone to have equal opportunity. We have been tracking cybersecurity as a preparedness category for a long time, and have built mitigations we believe in that enable us to make capable models broadly available.
3. We love you and we want you to win. We want to be a platform for every company, scientist, entrepreneur, and person. (My whole career has largely been about the magic of startups, and I think we are about to see that magic at hyperscale.)
Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found.
All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵