Children learn from play. Can robots do the same?
We propose 𝐏𝐥𝐚𝐲𝐟𝐮𝐥 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐨𝐛𝐨𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠, a paradigm that gives embodied coding agents a play stage before downstream tasks arrive, and instantiate it with 𝐑𝐀𝐓𝐬 (Robotics Agent Teams), where robots discover reusable skills through curious play.
Co-led with @jiaxin_ge_
Introducing Do as I Do 👀, a framework to transform everyday human videos into 100s of dexterous robot demos. Co-led with @bhawna_paliwal_ and @HarithejaE, and check out @notmahi's thread!
Here’s a little preview of our dexterous manipulation results. More about how we produce them from human reconstructions in this mini-thread! 🧵
https://t.co/tDpl9dGcqE
Robots are the bottleneck in scaling robotics, and learning from human video promises to solve it. But how can chaotic human data ever measure up to sanitized, lab-made teleoperation data?
Introducing Do as I Do: establishing a much needed correspondence between human videos and dexterous robot data. Some fun insights below: 🧵
Introducing ABC: open data, training, and infrastructure for robotics.
We release the largest teleop dataset to date, and extensively investigate design decisions, pretraining, and post-training techniques.
@arthurallshire@Cinnabar233@adamrasb@redstone_hong@davidrmcall
Academia optimizes for novelty,
which has become increasingly orthogonal to making things work. In practice it rewards benchmarking-chasing, optics-maxing, and flag-planting.
Sadly a major bitter lesson of robotics is: insights from the small-data, bad-system regime don’t transfer to the big-data, good-system one. The novelty we reward and the progress we need are pulling apart.
What if coding agents can perform autoresearch on a fleet of robots to study novel robot learning algorithms and improve policy?
Excited to introduce ENPIRE: a harness loop in which the coding agent first construct its own task-specific interfaces (env.reset / env.get_reward / env.get_done) via Code-as-Policy, wrapping deployment infra into a structured gym environment, then autonomously hill-climbs BC, RL, or heuristic policies on real robots. ENPIRE reframes policy improvement in the real world to a compute problem — it scales with token & robot utility input, not researchers’ intellectual and labor input. We also show a viable path to build a robotic data flywheel for challenging long-horizon tasks with coding agents: orchestrating available vision/control primitives to solve trivial phases (pick-n-place, free-space movement), then iterating on challenging phases (contact-rich, precision-critical) autonomously.
Project website: https://t.co/ycFcZiBVY0
Work done with co-leads @_wenlixiao@jiaxie_jason@TongheZhang01@NVIDIA@CMU_Robotics@Berkeley_AI
Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake.
Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence.
ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones.
A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning.
/goal: we all take a holiday and Jensen wouldn't even notice ;)
We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:
Open-Source Robotic Hand That Costs Just $300.
TetherIA has officially launched its open-source robotic hand, the Aero Hand Open.
Designed to be the most capable at the lowest cost, this hand is a new tool for embodied AI research. In the clip, it demonstrates impressive dexterity by grasping an M5 screw, picking up an iPhone, and opening a soda can.
Key Features:
► Price: $314.00
► Open Source: CAD, electronics, and firmware are all open source.
► Actuation: 7 Active Degrees of Freedom (16 joints) with a tendon-driven design.
► Weight: Under 400 grams.
► Purpose: Built for embodied AI research to make dexterous manipulation accessible to all.
For more details and to purchase: https://t.co/yE2QfOIVAE
Humanoid robots don't need to look human.
Meet Eno, our first general-purpose robot.
Not a machine pretending to be human, but intelligence given a body.
At Genesis, we’re building a future where robots don’t feel cold or distant, but capable, calm, and ready to help.
Available Q4 this year.
Introducing Curr-0: When Loco-Manipulation Meets Dexterity
Robots have learned to walk. Robots have learned to use their hands.
But in the real world, you can't do one without the other.
Your stance determines your reach. Your torso determines your balance. Your whole body moves before your fingers ever act.
This is what makes loco-dexterous manipulation hard — and what most robots still can't do.
Curr-0 is our humanoid foundation system for loco-dexterous manipulation. Locomotion, whole-body coordination, and dexterous hand control, learned together as one coupled behavior. Trained on 21k hours of real human data — including 3k hours of whole-body demonstrations — deployed on a near 70-DoF embodiment.
One model. One policy. Whole body.
This is Curr-0. And this is just the beginning.
Tech Report → https://t.co/M86SodGOx9
@NVIDIA is working on one of the hardest problems in Physical AI so you don’t have to: generalist robotic pick-and-place.
We are excited to introduce GraspGenX at #CVPR2026—a foundation model for robotic grasping that works out of the box for unknown robots, novel objects, and unseen environments.
Unlike Vision-Language-Action (VLA) models or dedicated grasp networks that require expensive, embodiment-specific training, GraspGenX is cross-embodiment and works zero-shot. You simply pass a "robot prompt" alongside an image of the object to generate actions.
🚀 Key Highlights:
1) Scaling: Trained on over 2 Billion 6-DoF grasp rollouts entirely in physics simulation—a dataset size practically impossible to collect via real-world teleoperation.
2) Zero-Shot Transfer: Works out of the box for several common robot grippers widely used across the research community and industry.
3) Built for the Agentic Era: Features native MCP support, client-server architecture, and skills.md, allowing seamless integration into LLM/Agentic robotics workflows.
4) Full Pipeline Integration: Pair it with other open foundation models (like SAM3) and advanced motion solvers like cuRoboV2 for full deployment in entirely unknown environments.
If you are currently executing pick-and-place with a VLA or WAM, you can use GraspGenX to generate sim-verified trajectory data and inject it into your pipeline. No need to waste precious real-world engineering hours on data collection for standard manipulation tasks.
🌐Website: https://t.co/a7acm4Pw7N
💻Code: https://t.co/eYUYxCb7Jp
📄Paper: https://t.co/pDOVp0VJLL
📍CVPR Booth: Poster 619 on Jun 6 1:45 session at ExHall F
This work was led by the incredible @BeiningH (Princeton), in collaboration with a phenomenal team at NVIDIA: @erwincoumans, @yu_wei_chao, @balakumar_, @clembow, and Stan Birchfield
#CVPR2026
How does test-time scaling impact robots?
We find that larger models, more thinking, and more context help significantly for some prompts but not others.
Like LLMs, we can also train a router to for a better performance/latency tradeoff!
Paper: https://t.co/HEjjCkrsen
Seeing multiple, mobile robots collaborate on manipulation tasks was my dream since I started at Stanford.
@riadoshi21 made it happen.
CHORUS is a single decentralized VLA policy whose architecture stays fixed regardless of team size.
The best way to get robust, high-quality robot performance is through reinforcement learning; but RL in either the real world or a traditional simulation has lots of limitations. Instead, @jiazhi_yang2024 in RISE does RL in a compositional world model. Learn more ->
SAM 3D: 3Dfy Anything in Images received an honorable mention for best paper at CVPR 2026. Even more satisfying, it has enabled learning human-object-interaction trajectories from video for training robots (I am at ICRA and had numerous conversations on this!). You can read the paper at https://t.co/coWYrkPSLL
I want to offer some unsolicited advice to computer vision researchers jumping into robotics. Don't focus too much on VLMs, VLAs etc. That's fine, but the real action is at the sensorimotor level. Most of the open problems in robotics are in manipulation, which is about hand-object interaction, and contacts and forces are central. Proprioception and tactile sensing are as important as vision. Don't get seduced by cherry-picked demos. You can't do robotics without doing robotics.
What representation enables open-world robot manipulation from generated videos?
Introducing Dream2Flow, our recent work that bridges video generation and robot control with 3D object flow.
https://t.co/vpRxoNBVF3
@Stanford#ICRA2026
1/N
We are back again :) After three weeks of quiet building.
Introducing Genesis World 1.0, our latest simulation platform, the second release in our full-stack suite. Open-sourced.
Robotics is still bottlenecked by the 1× speed of the physical world. Every model, checkpoint, and data recipe eventually needs to be tested on physical hardware, slowly, expensively, and with limited coverage.
One hour in reality can become 100 days in simulation. That is how robotics model iteration moves from a wall-clock bottleneck to a compute problem.
To make this work, simulation has to be both fast and trustworthy.
Over the past year, we rebuilt the entire stack: a GPU-accelerated cross-platform compiler, penetration-free multi-physics contact solvers, unified rigid and deformable physics, and a photo-realistic renderer purpose-built for physical AI applications.
We built Nyx, a high-performance path-traced rendering engine for robotics application.
Genesis World 1.0 achieves near realtime performance with our latest development for penetration-free IPC solver, supporting various types of deformables beyond rigid bodies. It supports contact-rich, dexterous manipulation simulation across different embodiments: unitree, sharpa, wuji, genesis hand and various types of grippers.
Under the hood is Quadrants, our effort in pushing forward cross-platform GPU-accelerated computation. Quadrants started as a fork of Taichi, and we rebuilt most of the critical parts for optimizing simulation workloads, giving 10x faster launch time and up to 4.6x runtime performance compared to the initial Genesis release.
Together, they bring us to an unprecedentedly low sim-to-real gap, enabling zero-shot real-to-sim model evaluation and much faster iteration of GENE.
All available today.
Genesis World 1.0: https://t.co/aknCM3eqws
Quadrants: https://t.co/uXqPNI4cb6
Nyx: https://t.co/R8j0djqGnV