tonight. 6PM. denver. come say hi and tell us your favorite part of the CVPR so far!
if you're on the list, you've got the address. if you're not, shoot us a message and we'll sort it.
https://t.co/AkELseZsOp
"Real-time" in marketing materials usually means "we batched it and it returns in 8 seconds."
Real-time means under 300ms. That's the threshold UI designers use to decide when to show a loading spinner.
There's a big difference. We build for the second one at @urunml.
Most providers route every request to a different accelerator. Mostly fine. Until you need a stateful loop. Drift across long workflows is real. At @urunml you're pinned to the same GPU on the same machine for the whole session. Stateful by design.
heading to @MLSysConf 2026 next week in Bellevue
if you're working on inference, real-time systems, or ML infra, come say hi. always up to trade notes on what "production" actually looks like for stateful, interactive AI workloads.
Every modality in AI follows the same arc: single-shot expensive generations to multi-turn cheap interactive loops.
Text and image already went through it. Video is next.
Two years ago, the open problem was getting an AI video model to produce a coherent 5-second clip. Recent techniques like Long Live and self-forcing solved that piece. The new bottleneck is serving it interactively. Labs are chasing the next model. The infra layer underneath is wide open.
Real-time interactive video is the hardest workload there is. Every frame has to land inside the 300ms human-perception bar. That's why we're starting there with @urunml. The rest is downhill.
Who we most want building on uRun: creative tooling companies and the studios behind tomorrow's video games. They'll go places we can't imagine → https://t.co/kbL1maA4T8
#AIvideo#GameDev#VFX
some snapshots of our launch party @ Joey the Cat in SF last week.
skee-ball, open bar, and real-time AI video on every screen.
thank you to everyone who came out and pushed the demos somewhere great and weird.
#WhatCanuRun → https://t.co/kbL1maA4T8
Introducing the founding team with three unique angles on the same problem.
Keegan ran inference at Luma during the Dream Machine launch. Sean wrote the O'Reilly book on Docker and has our GPU orchestration dialed in. Matt was running low-latency edge inference at AWS in 2017 (back when "real-time AI" meant the cameras at Amazon Go).
We built uRun for the infrastructure bottleneck no one else is solving.
https://t.co/LHSjGRD8Un
#AIvideo #FounderStory #realtimeAI #VideoInfra #GenerativeAI
Introducing the founding team with three unique angles on the same problem.
Keegan ran inference at Luma during the Dream Machine launch. Sean wrote the O'Reilly book on Docker and has our GPU orchestration dialed in. Matt was running low-latency edge inference at AWS in 2017 (back when "real-time AI" meant the cameras at Amazon Go).
We built uRun for the infrastructure bottleneck no one else is solving.
https://t.co/LHSjGRD8Un
#AIvideo #FounderStory #realtimeAI #VideoInfra #GenerativeAI
https://t.co/kbL1maA4T8 launch party - Wednesday, April 29 · 6PM:
🕹️ Arcade games
🍹 Open bar
💻 Live demos
🥽 Meta Quest Giveaway
Spots are limited - click the link to grab your invite.
👉 https://t.co/Hx5owyyoMX
The model moat is shrinking fast.
Kimi K2.6 just beat GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro.
But the story isn't the benchmarks - it's the execution layer:
→ 300 parallel agents
→ 13 hours autonomous coding
→ 4,000+ tool calls in one run
It's no longer intelligence per token. It's tokens per second.
Source: https://t.co/DbrvNIHSMy
#claude #moonshot #OpenSource
Imagine exploring this in real time as it generates. That is the infrastructure problem we have been solving.
Check us out -> https://t.co/qJrJZjhh5k
Today, we released Lyra 2.0, a framework for generating persistent, explorable 3D worlds at scale, from NVIDIA Research.
Generating large-scale, complex environments is difficult for AI models. Current models often “forget” what spaces look like and lose track of movement over time, causing objects to shift, blur, or appear inconsistent. This prevents them from creating the reliable 3D environments required for downstream simulations. Lyra 2.0 solves these issues by:
✅ Maintaining per-frame 3D geometry to retrieve past frames and establish spatial correspondences
✅ Using self-augmented training to correct its own temporal drifting.
Lyra 2.0 turns an image into a 3D world you can walk through, look back, and drop a robot into for real-time rendering, simulation, and immersive applications.
➡️ Learn more: https://t.co/ROR7miJeCU
📄 Read the paper: https://t.co/1osU9EGjGD