Associate Professor @UTCompSci | Director @NVIDIAAI Co-Leading GEAR | CS PhD @Stanford | Building generalist robot autonomy in the wild | Opinions are my own
Excited to share T-Rex: Tactile-Reactive Dexterous Manipulation 🦖🤖
Touch is fundamental to human dexterity, yet most Vision-Language-Action (VLA) models either ignore tactile feedback or lack the ability to react to high-frequency contact signals.
In this work, we tackle both the data and architectural challenges of tactile-reactive dexterous manipulation.
🦖 A 100-hour tactile-synchronized dexterous manipulation dataset with 7,700+ trajectories, 22 motor primitives, and 200+ everyday objects.
🦖 A tactile-reactive MoT architecture with spatial-temporal tactile encoding and asynchronous high-frequency tactile refinement.
🦖 A scalable training recipe combining 22,889 hours of human egocentric pretraining with tactile-grounded robot mid-training.
Across 12 real-world contact-rich manipulation tasks, T-Rex achieves over 30% higher average success rate than the strongest baseline.
We are fully open-sourcing the dataset, models, teleoperation stack, training code, and inference pipeline.
🌐 Project: https://t.co/AiHKRR8YXU
📄 Paper: https://t.co/mXY2UNLlqc
💻 Code: https://t.co/7skCxUtwKC
🤗 Dataset: https://t.co/uNwW8dcRZL
🧵 Thread ↓
Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake.
Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence.
ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones.
A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning.
/goal: we all take a holiday and Jensen wouldn't even notice ;)
We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:
This is actually sick! 🤯
A motion generator for robotics. and gaming.
This is MotionBricks cooked by @NVIDIAAI. It's a 15,000 FPS real-time motion generation for robots and games.
MotionBricks shipped to SIGGRAPH 2026 with code that integrates directly into NVIDIA's GR00T Whole-Body Control stack. 15,000 FPS, 2ms latency, 350,000 motion skills from a single neural backbone.
Okay, let's have a look how it works: first train one generative model on 350k production-grade mocap clips (BONES-SEED dataset from Bones Studio).
Then add "smart primitives" on top, a unified interface where you specify navigation targets, object interaction keyframes, and style prompts. The network generates everything else in real-time.
There's no animation graph., and no per-task fine-tuning.
Their demo-character navigates, picks up a sword, vaults a bench, sits down, switches between zombie/injured/skipping styles. Every frame generated by the network, in real-time.
I think that this matters as MotionBricks is now core to GR00T Whole-Body Control which is the same stack powering humanoids widely used in research across the globe.
Btw. code ships with an interactive G1 demo, but a full robotics-integrated release coming in ~1 month.
The motion stack for humanoid robots is getting bigger! 🔥
Check it out here: https://t.co/eiCcRTzTFc
cc: @NVIDIARobotics
~~
♻️ Join the weekly robotics newsletter, and never miss any news → https://t.co/GoA3ZuwWF9
Exciting news on GR00T:
NVIDIA announces our first open humanoid robot platform, featuring Unitree H2 Plus and Sharpa hands, to accelerate academic research and facilitate cross-institutional collaboration.
R&D in humanoid robotics needs broader participation. Open science is how we build the future faster, together.
NVIDIA announces the first open humanoid robot reference design built for robotics research.
The NVIDIA Isaac GR00T Reference Humanoid Robot combines the @UnitreeRobotics H2 humanoid robot, @SharpaRobotics Wave five-fingered hands for dexterous manipulation, Jetson Thor onboard compute, and Isaac GR00T open software and models, giving researchers a full-stack platform from data capture to model deployment.
Read the #NVIDIAGTC Taipei announcement: https://t.co/ZsT3qQKucb
I will be in Vienna in two weeks to give a keynote at #ICRA2026. I'll share our recent progress on building generalist humanoid robots and show some of the latest results.
Check out my talk on June 3: https://t.co/DTovfYLb6v
Now you can use GR00T N1.7 and SONIC together to enable tasks that require TRUE whole-body coordination!! Including simultaneous precise hand and foot placement, like opening a trash can with the foot pedal and throwing an object inside!
Try it yourself, it is so fun!
GR00T-VisualSim2Real is now open source!
VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids.
Repo: https://t.co/vgRsCeRG8w
What is missing to bring real-time motion research into AAA games and real-world robotics?
We present MotionBricks, a step toward bridging this gap with two key components:
- a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability.
- a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors.
Webpage: https://t.co/aJE5skUuWD
Code: https://t.co/r56D3TJ8CW
Paper: https://t.co/CtOHXnHZMv (ACM TOG / SIGGRAPH 2026)
🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM).
But why does it work? How can we improve it further?
Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments.
😮Key insight: it’s all about representations.
- Alignment → enables transfer
- Discernibility → enables adaptation
⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment.
⬇️Let’s take a deep dive into that:
Paper: https://t.co/RWCAxdBC0j
Website: https://t.co/BwgbwCkevA
We've just open-sourced the SONIC training code, the training data, the algorithms used to generate the data, and more to come. You now have the full recipe for building SONIC whole-body controllers for your own humanoids. Enjoy!
Code: https://t.co/WAZ1P12shu
Web Demo: https://t.co/sc3yxIKpuJ
SONIC is now open-source!
Generalist whole-body teleoperation for EVERYONE!
Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared.
This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon!
Code: https://t.co/7u3SBxzXU9
Docs: https://t.co/HpDLkTCSMF
Site: https://t.co/D3i4KlnLLr
Robotics: coding agents’ next frontier.
So how good are they?
We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability.
From @NVIDIA@Berkeley_AI@CMU_Robotics@StanfordAILab
https://t.co/MVcc6XWQhY
🧵
Catastrophic forgetting has long been a challenge in continual learning.
However, our new study found that pretrained Vision-Language-Action (VLA) models are surprisingly resistant to forgetting!
Zero forgetting, or even positive backward transfer, is possible with simple experience replay.
https://t.co/tIXenn0vSa
Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers:
• 2,500 realistic kitchen environments;
• 365 everyday tasks (basic skills + long-horizon mobile manipulation);
• Over 3,200 objects with many articulated fixtures/appliances.
All are designed for fully controlled, reproducible benchmarking of robotic policies.
Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity.
This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale.
Check it out at https://t.co/0EV3tPmTVy
CoRL is coming to Austin, TX this November!
As General Chair, I'm thrilled to welcome the robot learning community. 2026 feels like a pivotal year as AI-powered robotic systems begin deploying at scale for real-world tasks. This year, I hope CoRL will be the forum that connects cutting-edge research with industrial practice.
Please submit your best work and join us in Austin. DM me what you'd love to see CoRL do better!
https://t.co/iCJI5xLz7N
Calling all researchers! 🤖The CoRL 2026 website is officially live at https://t.co/IievbzR8xd with key dates for your submissions:
🗓 May 25: Abstract Submission
🗓 May 28: Full Paper Submission
🗓 Nov 9-12: Conference in Austin, TX
Send us your coolest work!
#RobotLearning
We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop.
Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate.
Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution.
Our recipe is called "EgoScale":
- Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks.
- Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency.
- Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone.
The scalable path to robot dexterity was never more robots. It was always us.
Deep dives in thread:
Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It's Simulation 2.0. Time for robotics to take the bitter lesson pill.
Real-world robot learning is bottlenecked by time, wear, safety, and resets. If we want Physical AI to move at pretraining speed, we need a simulator that adapts to pretraining scale with as little human engineering as possible.
Our key insights: (1) human egocentric videos are a scalable source of first-person physics; (2) latent actions make them "robot-readable" across different hardware; (3) real-time inference unlocks live teleop, policy eval, and test-time planning *inside* a dream.
We pre-train on 44K hours of human videos: cheap, abundant, and collected with zero robot-in-the-loop. Humans have already explored the combinatorics: we grasp, pour, fold, assemble, fail, retry—across cluttered scenes, shifting viewpoints, changing light, and hour-long task chains—at a scale no robot fleet could match. The missing piece: these videos have no action labels. So we introduce latent actions: a unified representation inferred directly from videos that captures "what changed between world states" without knowing the underlying hardware. This lets us train on any first-person video as if it came with motor commands attached.
As a result, DreamDojo generalizes zero-shot to objects and environments never seen in any robot training set, because humans saw them first.
Next, we post-train onto each robot to fit its specific hardware. Think of it as separating "how the world looks and behaves" from "how this particular robot actuates." The base model follows the general physical rules, then "snaps onto" the robot's unique mechanics. It's kind of like loading a new character and scene assets into Unreal Engine, but done through gradient descent and generalizes far beyond the post-training dataset.
A world simulator is only useful if it runs fast enough to close the loop. We train a real-time version of DreamDojo that runs at 10 FPS, stable for over a minute of continuous rollout. This unlocks exciting possibilities:
- Live teleoperation *inside* a dream. Connect a VR controller, stream actions into DreamDojo, and teleop a virtual robot in real time. We demo this on Unitree G1 with a PICO headset and one RTX 5090.
- Policy evaluation. You can benchmark a policy checkpoint in DreamDojo instead of the real world. The simulated success rates strongly correlate with real-world results - accurate enough to rank checkpoints without burning a single motor.
- Model-based planning. Sample multiple action proposals → simulate them all in parallel → pick the best future. Gains +17% real-world success out of the box on a fruit packing task.
We open-source everything!! Weights, code, post-training dataset, eval set, and whitepaper with tons of details to reproduce. DreamDojo is based on NVIDIA Cosmos, which is open-weight too.
2026 is the year of World Models for physical AI. We want you to build with us. Happy scaling!
Links in thread:
We have seen rapid progress in humanoid control — specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work.
To progress toward this goal, we developed SONIC (https://t.co/zOZVraFuDV), a Behavior Foundation Model for real-time, whole-body motion generation that supports teleoperation and VLA inference for loco-manipulation.
Today, we’re open-sourcing SONIC on GitHub. We are excited to see what the community builds upon SONIC and to collectively push humanoid intelligence toward real-world deployment at scale.
🌐 Paper: https://t.co/DGBP7LAvuT
📃 Code: https://t.co/WAZ1P13072