NVIDIA Cosmos 3 is live on DeepInfra.
The first open world foundation model for physical AI that reasons before it generates. Built for robots, AVs, simulation, synthetic data generation.
NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model
We partnered with @nvidia to evaluate this model for intelligence and speed - these figures use the model’s BF16 weights, but as with Nemotron 3 Super the model will be made available in NVFP4 quantization as well for higher inference performance.
➤ New leader for US open weights intelligence: Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. This is well ahead of the next strongest US open weights models, Gemma 4 31B (39), Nemotron 3 Super (36) and gpt-oss-120b (33), but behind the Chinese-led open weights frontier (Kimi K2.6 at 54).
➤ Leading speed for its intelligence: on a pre-release @DeepInfra endpoint, Nemotron 3 Ultra served over 300 tokens per second. Peer models in its size class from China-based labs such as DeepSeek and Moonshot (Kimi) are generally served at speeds of 50-100 tokens per second in the market today. gpt-oss-120b is served at speeds similar to this level, but with significantly lower intelligence.
➤ Largest Nemotron 3 model so far: at approximately 550 billion total parameters and 90% sparsity, Nemotron 3 Ultra is significantly larger than its siblings and is the largest recent US open weights model release
We’ll be sharing additional analysis and full benchmarks at release.
The right question, and one too few enterprises are asking. Thanks @realmtbman and @palebluenexus for having our co-founder @nikolaborisof on.
Full episode: https://t.co/AZMuaTllzq
Enterprises ask "is your AI compliant?"
The better question: who actually runs the inference?
Nikola Borisov, co-founder of @DeepInfra ($107M Series B raise - including NVIDIA) on @palebluenexus:
"You want to make sure you're not giving it to someone that will give it to someone that will give it to someone. And maybe the final inference happens in China."
"I wasn't sure what we'd build. I just wanted to work with my co-founders. We ended up deciding to do AI infrastructure. It was a great choice."
Our CEO @nikolaborisof on Scaling Without Breaking podcast: why the team came before the idea.
https://t.co/uGCMuPavaf
Check it out on more platforms👇
Introducing Realtime TTS-2, a new generation of voice model built for realtime conversation.
It is the first voice model that hears the conversation, takes natural-language voice direction, holds one voice identity across over 100 languages, and speaks like a person who is paying attention.
The result is voice AI that feels as good as it sounds.
Try it out: https://t.co/80xL7AJveV
Learn More: https://t.co/PLUiAEFizP
New on DeepInfra: Realtime TTS 2.0 from @inworld_ai
• Prompt emotion + tone in plain English
• Cross-lingual voices
• Built for realtime apps
$35 / 1M characters
DeepInfra has raised its $107M in Series B funding 🚀
AI is moving from training to production-scale deployment, and inference is becoming the system constraint.
DeepInfra was built for this shift — scaling high-throughput inference for open-source and agent-driven workloads. Grateful to our investors and partners, co-led by @500GlobalVC and @gharik