🤖 At #CVPR2026@CVPR this week? Join the 1st Interactive Physical AI Workshop (IPA 2026) — human-robot interaction, embodied social agents, full-duplex multimodal conversation & lifelike robots and avatars and more!
🗓️ Wed June 3 · 8:25 AM–1:00 PM MDT
📍 Rooms 210/212
Keynotes:
• Alexander Richard (Meta) — Embodied Social Agents in XR
• Maja Matarić (USC / Google DeepMind) — Human-Centered AI & Robotics
• Agon Serifi (Disney Research) — From Human Motion to Robot Behavior
🔗 https://t.co/47gIzm6ZHK
@NVIDIAAI@swookpark@shalinidemello@mct1224@amritamaz@lmathur_
Badge-Pickup, Breakfast, but what then? Come to the Photorealistic 3D Head Avatars workshop in room 107!
@CVPR
We have no coffee but our thrilling avatars will energize you nonetheless!
Join us at the 2nd Workshop on Photorealistic 3D Head Avatars (P3HA) at #CVPR2026!
We’ll discuss the latest advances in avatar creation, photorealistic rendering, and 2D vs. 3D avatars.
📍 June 3, 2026
🕣 8:50 AM – 12:30 PM MDT
📌 Room 107, Colorado Convention Center, Denver
Excited to release VideoFDB: the first benchmark for full-duplex audio-visual-to-audio-visual (AV2AV) conversational agents. VideoFDB tests whether models can act as active interlocutors in 1:1 conversation, not just answer questions about video.
🤖 At #CVPR2026@CVPR this week? Join the 1st Interactive Physical AI Workshop (IPA 2026) — human-robot interaction, embodied social agents, full-duplex multimodal conversation & lifelike robots and avatars and more!
🗓️ Wed June 3 · 8:25 AM–1:00 PM MDT
📍 Rooms 210/212
Keynotes:
• Alexander Richard (Meta) — Embodied Social Agents in XR
• Maja Matarić (USC / Google DeepMind) — Human-Centered AI & Robotics
• Agon Serifi (Disney Research) — From Human Motion to Robot Behavior
🔗 https://t.co/47gIzm6ZHK
@NVIDIAAI@swookpark@shalinidemello@mct1224@amritamaz@lmathur_
This is THE moment of Physical AI!
We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀
- Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions.
- It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.”
- Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks.
Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate.
The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community.
Welcome to the era of Physical AI.
HuggingFace: https://t.co/QW5h5pIWWM
Project Website: https://t.co/Jppa0gkn16
Code: https://t.co/aJgaLm5BaG
Introducing Cosmos 3: Our latest frontier model for Physical AI
Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation.
Today we’re releasing Super (32B) and Nano (8B) variants.
The workshop hosts a fantastic lineup of diverse speakers from both industry and academia. Additionally, we will learn about the two winning methods of the Single-view 3D Face Reconstruction and Monocular 3D Avatar Creation challenge.
The Photorealistic 3D Head Avatars workshop takes place at @CVPR on:
🗓️June 3rd
🕘 8:50am - 12:30pm
📍Room 107
Join us for latest trends on avatar creation, photorealistic rendering, and discussions on 2D vs 3D avatars.
Workshop website: https://t.co/kzX1WSthVM
🤖 At #CVPR2026 next week? Join the 1st Interactive Physical AI Workshop (IPA 2026) — embodied social agents, full-duplex multimodal conversation & lifelike robots and avatars and more!
🗓️ Wed June 3 · 8:25 AM–1:00 PM MDT
📍 Rooms 210/212
Keynotes:
• Alexander Richard (Meta) — Embodied Social Agents in XR
• Maja Matarić (USC / Google DeepMind) — Human-Centered AI & Robotics
• Agon Serifi (Disney Research) — From Human Motion to Robot Behavior
🔗 https://t.co/47gIzm6ZHK
@NVIDIAAI@swookpark@shalinidemello@mct1224@amritamaz@lmathur_
🤖 At #CVPR2026 next week? Join the 1st Interactive Physical AI Workshop (IPA 2026) — embodied social agents, full-duplex multimodal conversation & lifelike robots and avatars and more!
🗓️ Wed June 3 · 8:25 AM–1:00 PM MDT
📍 Rooms 210/212
Keynotes:
• Alexander Richard (Meta) — Embodied Social Agents in XR
• Maja Matarić (USC / Google DeepMind) — Human-Centered AI & Robotics
• Agon Serifi (Disney Research) — From Human Motion to Robot Behavior
🔗 https://t.co/47gIzm6ZHK
@NVIDIAAI@swookpark@shalinidemello@mct1224@amritamaz@lmathur_
🚀Announcing NeRSemble 3D Head Avatar Benchmark v2
Version 2 of the NeRSemble 3D Head Avatar Benchmark systematically evaluates several aspects of 3D head avatar creation. Our goal is to drive progress toward more realistic, robust, and generalizable avatar methods.
🔬Benchmark Tasks
The NeRSemble Benchmark v2 features three core challenges:
- Dynamic Novel View Synthesis
- Monocular FLAME-driven Avatar Creation (updated)
- Single-view 3D Face Reconstruction (new)
👉Explore the online leaderboard and submission system: https://t.co/dUdsFWzELp
🆕What's new?
1. New Task: Single-view 3D Face Reconstruction
Given a single portrait image, reconstruct an accurate 3D mesh either showing the input expression or a fully neutral one. Unlike prior benchmarks, the NeRSemble benchmark emphasizes diverse and challenging facial expressions, better reflecting real scenarios. For technical details, see the Pixel3DMM paper.
2. Updated task: Monocular FLAME-driven Avatar Creation
We have improved the FLAME tracking that is used for both avatar creation from the monocular videos and avatar driving on the hidden test sequences. The updated benchmark task has:
- more stable torso tracking
- more expressive lip closures during speech
- Improved mouth tracking for challenging facial expressions
We hope that these improvements to the benchmark help drive the field forward.
🏆 CVPR 2026 Workshop & Prizes
The NeRSemble benchmark will be featured at the CVPR 2026 Workshop on Photo-realistic 3D Head Avatars.
Participants in the new and updated tasks have the opportunity to win:
- 🎁RTX 5080 GPUs (sponsored by NVIDIA)
- 🎤15-minute oral presentation at the workshop
⏰ Submission Deadline
- May 26, 2026
Reach out to the amazing @TobiasKirschst1 and @SGiebenhain for more details :)
Submit your method to the NeRSemble 3D Head Avatar benchmark!
The winners of the Single-view 3D Face Reconstruction and Monocular FLAME Avatar tracks will win a GPU prize and get an oral presentation slot at our @CVPR workshop.
Submissions open until May 26th.
#CVPRWorkshop
📢IPA 2026 (Interactive Physical AI) @CVPR 2026.
Announcing the 1st Interactive Physical AI workshop at CVPR 2026! This half-day workshop will bring together researchers to discuss AI systems that perceive humans and scenes, communicate via multimodal signals, and act safely in our shared physical world. 🧵
#CVPR2026
📢IPA 2026 (Interactive Physical AI) @CVPR 2026.
Announcing the 1st Interactive Physical AI workshop at CVPR 2026! This half-day workshop will bring together researchers to discuss AI systems that perceive humans and scenes, communicate via multimodal signals, and act safely in our shared physical world. 🧵
#CVPR2026