Excited to share Do as I Do! We turn everyday human videos into physically consistent robot data that can be directly executed in the real world.
This was a fun collaboration with @bhawna_paliwal_ and @willjhliang, with lots of moving parts. More details in Mahi's thread below👇
Robots are the bottleneck in scaling robotics, and learning from human video promises to solve it. But how can chaotic human data ever measure up to sanitized, lab-made teleoperation data?
Introducing Do as I Do: establishing a much needed correspondence between human videos and dexterous robot data. Some fun insights below: 🧵
Excited to share Do as I Do! We turn everyday human videos into physically consistent robot data that can be directly executed in the real world.
This was a fun collaboration with @bhawna_paliwal_ and @willjhliang, with lots of moving parts. More details in Mahi's thread below👇
Robots are the bottleneck in scaling robotics, and learning from human video promises to solve it. But how can chaotic human data ever measure up to sanitized, lab-made teleoperation data?
Introducing Do as I Do: establishing a much needed correspondence between human videos and dexterous robot data. Some fun insights below: 🧵
Excited to release Do As I Do: a pipeline that turns everyday RGB human videos into dexterous robot manipulation trajectories!
Most prior work has been narrow, consisting of just lab recorded demos, egocentric-only, or assuming a closed set of objects. We develop a modular pipeline that can handle Internet, egocentric, exocentric, AND generated videos with virtually any rigid object. Also check out Mahi's post below!
Introducing Do as I Do 👀, a framework to transform everyday human videos into 100s of dexterous robot demos. Co-led with @bhawna_paliwal_ and @HarithejaE, and check out @notmahi's thread!
Here’s a little preview of our dexterous manipulation results. More about how we produce them from human reconstructions in this mini-thread! 🧵
https://t.co/tDpl9dGcqE
Enabling learning motion directly from videos rather than using them for action supervision is a superior method and likely more scalable.
While it is early this line of work suggests replicating the playbook that made robots walk.
--> Real videos provide state supervision (not action)
--> retargeting provides reference trajectories.
--> RL tracks these trajecotries.
This is a very good example of the separation of the "What" and the "How"
Introducing ABC: open data, training, and infrastructure for robotics.
We release the largest teleop dataset to date, and extensively investigate design decisions, pretraining, and post-training techniques.
@arthurallshire@Cinnabar233@adamrasb@redstone_hong@davidrmcall
Introducing Human Universal Grasping (HUG): dexterous grasping learned entirely from human hands, with zero robot data.
🌐 Website: https://t.co/78rfwuuh4J
📄 Paper: https://t.co/BhAI4a1esg
💻 Code: https://t.co/omtjbM7Scl
I'll be at @CVPR (briefly), speaking at the Sense of Space workshop tomorrow @ 9:15 about how robots may be slowing down robotics.
I spent the past year thinking more about the role of human data, simulation, and dexterous manipulation; happy to connect if you're doing the same!
We are back. After one year of quiet building.
Introducing GENE-26.5, our first robotic brain that takes a major step toward human-level capability.
For years, robotics has struggled to learn from the world’s largest and valuable data source: Humans.
Solving it means rethinking the whole stack from the ground up:
- A robotics-native foundation model.
- A 1:1 human-like robotic hand.
- A noninvasive data collection glove for motion, force, and touch.
- A simulator that turns weeks of experiments into minutes.
GENE-26.5 is trained across language, vision, proprioception, tactile, and action. We designed a set of tasks to test how far we can go with this new paradigm.
Fully autonomous, 1x speed, one model, same weights. (Enjoy with sound on)
We are approaching the endgame for robotics.
And this is just a beginning.
ARI is joining @Meta!
Over the past year, we have been building ARI (Assured Robot Intelligence) with the mission to build industry-grade physical AI for humanoids. The ARI stack is built on human experience, condensed into actionable tokens that can be rapidly adapted to real-world hardware.
But the most rewarding part of ARI has been the people. I feel truly blessed to have worked alongside some of the world's best roboticists, a top-notch investor pool led by @aixventureshq, and the many supporters pushing for us behind the scenes.
Starting next week, ARI will join the Meta Superintelligence Labs (MSL) to continue advancing frontier robotics models that advance personal superintelligence in the physical world. We have the potential to transform AI that can think and talk to AI that can do, assisting humans safely and reliably in the physical world.
To the many people behind the scenes who supported us: Thank you! This is just the beginning.
More in the Bloomberg article:
Introducing Tether 🪢, a fun little idea to scale data by having our robot “play” in the real world for over 24 hours, throughout the day and overnight—improving policies from zero to mastery with minimal supervision!
But play is messy, with out-of-distribution scenarios that are hard to anticipate. To perform autonomous functional play in the real world, from just a handful of demos, we propose a highly robust few-shot imitation method that warps demo trajectories using visual correspondences. Then, continuously running it within a multi-task VLM-guided cycle, we generate a data stream that produces 1000+ expert-level demos. This generated data is finally funneled downstream to train imitation learning policies, which improve from zero to near-perfect success rates.
We’ll be presenting Tether at #ICLR2026 in just a few weeks! But before that, deep dive with me… 🧵
Fully open-source, customizable hardware is the way for robotics research. Introducing Your Own Robot (YOR), a mobile bimanual robot platform for ~$10k.
Why buy a robot when you can build your own?
Meet YOR, our new open-source bimanual mobile manipulator robot – built for researchers and hackers alike for only ~$10k. 🧵👇
We don't need the name of an object to pick it up; we simply need to know where it is and what it looks like.
Introducing Contact-Anchored Policies (CAPs): instead of language, we explicitly condition on contacts. Our policy learns object pickup with only 16 hours of data! 🧵
Best ideas are often the simplest in hindsight.
Meet Contact-Anchored Policies (CAP)🧢: by conditioning policies on physical contact (vs language) we achieve env & embodiment generalization with super low resources.
This policy ⬇️ learned to pick from scratch w/ 16 hrs of data 🧵
I will join UChicago CS @UChicagoCS as an Assistant Professor in late 2026, and I’m recruiting PhD students in this cycle (2025 - 2026).
My research focuses on AI & Robotics - including dexterous manipulation, humanoids, tactile sensing, learning from human videos, robot systems, and anything needed to make robots truly work and improve everyday life. I also place strong emphasis on open-source.
Check my homepage to learn more: https://t.co/gBZAFrwZmg.
Please reachout if you are interested! The deadline is Dec 11th. Link: https://t.co/yKTmcZu7FP.