Now you can use GR00T N1.7 and SONIC together to enable tasks that require TRUE whole-body coordination!! Including simultaneous precise hand and foot placement, like opening a trash can with the foot pedal and throwing an object inside!
Try it yourself, it is so fun!
Open-sourcing the whole package here!
The last piece of our SONIC open-source, data collection, gr00t VLA post-training, inference just hit the repo!
Train your Autonomous policies on G1 Whole-body with SONIC and gr00t N1.7!
🧑💻Code: https://t.co/7u3SBxzXU9
📑Docs: https://t.co/NhwlZtRqUu
Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake.
Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence.
ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones.
A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning.
/goal: we all take a holiday and Jensen wouldn't even notice ;)
We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:
NitroGen just won CVPR Best Paper Honorable Mention!! We are making strides towards general-purpose embodied agents that master not only the real world physics, but also all possible physics across a multiverse of simulations.
It’s been 4 years since MineDojo, our first embodied agent in Minecraft, won NeurIPS Best Paper. Congrats to everyone on the team!!
Exciting news on GR00T:
NVIDIA announces our first open humanoid robot platform, featuring Unitree H2 Plus and Sharpa hands, to accelerate academic research and facilitate cross-institutional collaboration.
R&D in humanoid robotics needs broader participation. Open science is how we build the future faster, together.
I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy homework ;)
And stay till the end, more easter eggs and predictions for your polymarket!
00:30 DGX-1 origin story at OpenAI, I was there in 2016 signing with Jensen and Elon. Heading to the Computer History Museum!
01:42 The Great Parallel
03:31 Robotics, the Endgame
03:39 Why VLAs fall short
04:32 Video world models as the 2nd pretraining paradigm
06:09 World Action Models (WAM)
07:46 Strategies for robot data collection and the FSD equivalent to physical data flywheel for robot manipulation
11:06 EgoScale and the Dexterity Scaling Law we discovered recently
14:00 Physical RL: bridging the last mile
15:39 DreamDojo: an end-to-end neural physics engine for scaling RL in silico
17:00 Civilizational Technology Tree and my predictions for the near future. Spoiler: it's closer than you think.
Thanks to my friends at Sequoia for inviting me back to AI Ascent this year! I had a blast! Last year's talk is attached in the thread if you missed it.
🚀 @Solo__Tech breakthrough: NVIDIA Sonic goes beyond the G1. 🌏
We’ve achieved a global first: migrating NVIDIA Sonic to a completely different humanoid morphology, the AGIBOT X2 🥇
This is a massive leap for transferable humanoid intelligence. We are moving away from single-robot controllers toward architectures that generalize across diverse embodiments.
The Specs:
Hardware: AGIBOT X2 Ultra (31 DoF)
Precision: 14-DoF Dexterous Hands
Capabilities:
- End-to-end whole-body locomotion with manipulation
- Single leg balancing and stylized motion
- Upper body gestures
Big respect to the team fueling this innovation all the way!
@meetsitaram@zeeshaan_7788@Samarth_1506@DevSodhi@flyingtaxiguy @build @frontiertower@nvidia@AGIBOTofficial@nebiusai@UFBots@vitl2907@XeniaBulatov@NVIDIARobotics
To learn more about Solo Tech at the frontier of Physical AI: https://t.co/BRZejo6e9c
#PhysicalAI #Robotics #Humanoids #NVIDIA #AGIBOT #SoloTech #EmbodiedAI #SoloSeven
Open-sourcing the whole package here!
The last piece of our SONIC open-source, data collection, gr00t VLA post-training, inference just hit the repo!
Train your Autonomous policies on G1 Whole-body with SONIC and gr00t N1.7!
🧑💻Code: https://t.co/7u3SBxzXU9
📑Docs: https://t.co/NhwlZtRqUu
GR00T-VisualSim2Real is now open source!
VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids.
Repo: https://t.co/vgRsCeRG8w
What is missing to bring real-time motion research into AAA games and real-world robotics?
We present MotionBricks, a step toward bridging this gap with two key components:
- a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability.
- a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors.
Webpage: https://t.co/aJE5skUuWD
Code: https://t.co/r56D3TJ8CW
Paper: https://t.co/CtOHXnHZMv (ACM TOG / SIGGRAPH 2026)
Want to experience humanoid behavior foundation model in your browser? SONIC Web demo now live at https://t.co/LutwTt1SGl
Featuring:
- Motion tracking
- Text to motion using Kimodo https://t.co/jFwqDdiikv
Try it out!
SONIC training code + Finetuning checkpoint + VLA data collection scripts are open-sourced.
Little easter egg on the GEAR-SONIC website too :)
https://t.co/2BgYj3tLYO
What can half of GPT-1 do? We trained a 42M transformer called SONIC to control the body of a humanoid robot. It takes a remarkable amount of subconscious processing for us humans to squat, turn, crawl, sprint. SONIC captures this "System 1" - the fast, reactive whole-body intelligence - in a single model that translates any motion command into stable, natural motor signals. And it's all open-source!!
The key insight: motion tracking is the one, true scalable task for whole body control. Instead of hand-engineering rewards for every new skill, we use dense, frame-by-frame supervision from human mocap data. The data itself encodes the reward function: "configure your limbs in any human-like position while maintaining balance".
We scaled humanoid motion RL to an unprecedented scale: 100M+ mocap frames and 500,000+ parallel robots across 128 GPUs. NVIDIA Isaac Lab allows us to accelerate physics at 10,000x faster tick, giving robots many years of virtual experience in only hours of wall clock time. After 3 days of training, the neural net transfers zero-shot to the real G1 robot with no finetuning. 100% success rate across 50 diverse real-world motion sequences.
One SONIC policy supports all of the following:
- VR whole-body teleoperation
- Human video. Just point a webcam to live stream motions.
- Text prompts. "Walk sideways", "dance like a monkey", "kick your left foot", etc.
- Music audio. The robot dances to the beat, adapting to tempo and rhythm.
- VLA foundation models. We plugged in GR00T N1.5 and achieved 95% success on mobile tasks.
We open-source the code and model checkpoints!! Deep dive in thread:
Human data as the most scalable data source for robotics!
Seeing first-hand the generalization capabilities unlocked by human video data was mind-blowing to me. Check out our paper!
We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop.
Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate.
Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution.
Our recipe is called "EgoScale":
- Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks.
- Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency.
- Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone.
The scalable path to robot dexterity was never more robots. It was always us.
Deep dives in thread:
Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on 🧵what we learned. 👇
SONIC is now open-source!
Generalist whole-body teleoperation for EVERYONE!
Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared.
This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon!
Code: https://t.co/7u3SBxzXU9
Docs: https://t.co/HpDLkTCSMF
Site: https://t.co/D3i4KlnLLr
We have seen rapid progress in humanoid control — specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work.
To progress toward this goal, we developed SONIC (https://t.co/zOZVraFuDV), a Behavior Foundation Model for real-time, whole-body motion generation that supports teleoperation and VLA inference for loco-manipulation.
Today, we’re open-sourcing SONIC on GitHub. We are excited to see what the community builds upon SONIC and to collectively push humanoid intelligence toward real-world deployment at scale.
🌐 Paper: https://t.co/DGBP7LAvuT
📃 Code: https://t.co/WAZ1P13072