Introducing Ego1, our first egocentric capture headset for Physical AI.
We co-designed the hardware and the perception stack to turn everyday first-person activity, especially manipulation, into training data for robotic models at planet scale.
High-quality motion reference data is key for humanoid skill learning 🤖🕺💃
A natural idea is to leverage human motions and “translate” them to humanoid motions, a process known as retargeting. For interaction-rich tasks such as scene interaction and loco-manipulation, retargeting is challenging: it must ensure motion consistency, smoothness, kinematic feasibility (no artifacts like penetration or foot skating), and scalability (one framework can handle thousands of motions).
Excited to release OmniRetarget — a scalable retargeting method with a 4-hour high-quality humanoid motion dataset for interaction-rich tasks. OmniRetarget takes an interaction-preserving perspective: we optimize Laplacian deformation between source and target interaction meshes while enforcing kinematic constraints, producing consistent, smooth, and feasible trajectories at scale. Even better, OmniRetarget can efficiently augment motions by varying terrains, objects, and initial poses.
This high-quality interaction-preserving retargeting enables a minimal RL setup to execute long-horizon (up to 30s) agile, interaction-rich skills. All tasks in the video share just 5 rewards, 4 domain randomization terms, and rely only on proprioception.
More details: https://t.co/4lfqT126MY
The most common misunderstanding people have when getting into robotics, is that data is useful by default. *good* data is the real fundamental limiter. and with good data (from things like omniretarget) you can do really cool stuff
Introducing Ego1, our first egocentric capture headset for Physical AI.
We co-designed the hardware and the perception stack to turn everyday first-person activity, especially manipulation, into training data for robotic models at planet scale.
Ego1 is our all-in-one egocentric headset for robot-learning data.
It captures high-quality, hardware-synced stereo video + 200Hz IMU, then turns everyday first-person manipulation into policy-training data: metric head trajectory, 3D hand tracking, and dense depth.
And dense depth for the whole scene. Stereo matching (SGBM) estimates disparity between the rectified views and converts it to metric depth using the calibrated baseline. Each frame becomes a 16-bit-mm depth map that drops into ROS, LeRobot, or Open3D.
Robotics is increasingly a data problem. There is no lack of robots that can backflip. There is a lack of Robot who can cook Kung Bao Chicken for you. We talked to 20+ researchers, hardware vendors, surveyed 12+ research and deployed systems to write this guide. Hope you enjoy!