Imitation learning has a data scarcity problem.
Introducing EgoDex from Apple, the largest and most diverse dataset of dexterous human manipulation to date — 829 hours of egocentric video + paired 3D hand poses across 194 tasks.
Now on arxiv: https://t.co/bJBPER8GTC (1/4)
A year ago, I took a big bet and shifted my research to world models. We started with navigation, but the vision was broader: simulate any interaction with the environment, including fine grained manipulation.
Today we introduce DexWM, a world model for dexterous manipulation. Trained on 900+ hours of human and robot video, DexWM lets us imagine, plan, and execute dexterous actions on a real robot.
very interesting question and so much to unpack. it started in the micro-kitchen. @DavidJFan suggested we talk to @JimmyTYYang1 to brainstorm on how to deploy a WM on a Franka arm.
We knew we have a model architecture ("CDiT") which is likely to work. But we missed the right human video training data and a roboticist with capacity to lead.
Then:
1) EgoDex, a new large-scale dataset by Apple dropped
2) @raktimgg joined our team and brought the expertise we didn't have. he immediately saw the potential.
Big update: I've left Apple. It’s been a blast pushing the frontier of learning dexterity from human video.
I've now joined Meta’s new Robotics Studio as a Research Scientist, where I'll be building some exciting new products with super talented people. Stay tuned!
Continued working on the ego-dex dataset, I ported the entire test set to @rerundotio and created a @Gradio app to view it! Links below VVV
This allows for a straightforward way to explore each episode of the (test) dataset and better understand how the hand-tracking and slam systems performed.
I had to sadly reencode the videos to AV1, which took up a ton of time (nearly 2 hours of wall time for just the test dataset)
Next up is taking this representative dataset and making it amenable to training. I'll start with something easy, such as pose estimation, as it's what I'm most familiar with, but the goal is to allow RRD <-> Webdataset standard.
Imitation learning has a data scarcity problem.
Introducing EgoDex from Apple, the largest and most diverse dataset of dexterous human manipulation to date — 829 hours of egocentric video + paired 3D hand poses across 194 tasks.
Now on arxiv: https://t.co/bJBPER8GTC (1/4)
🚨Introducing EgoDex, the largest ego-centric video dataset to-date that focuses on human dexterous manipulation, with structured annotations including 3D upper-body and hand tracking🤲, camera pose📷, and language annotation💬.
Kudos to the team and looking forward to what the community can cook from it. Checkout our preprint on arXiv, and data is available for downloading NOW.
I am at Atlanta attending ICRA. DMs are open and happy to chat in person.
📄Preprint: https://t.co/pbUA9CoTId
#ICRA #robotics #imitationlearning #dexterousmanipulation
We also propose new benchmarks and train imitation learning policies for dexterous trajectory prediction. Below are 30 Hz wrist and fingertip trajectories on the test set, where blue = ground truth, red = model predictions, and points get lighter up to 2 seconds in the future.
🚀 New Research on Human-Robot Interaction! 🤖
How can humanoid robots communicate beyond words? Our framework, EMOTION, leverages Large Language Models (LLMs) to dynamically generate expressive gestures, enhancing non-verbal communication in robots.
🤯 Our experiments show that EMOTION can generate various expressive gestures from only TWO examples and match human-generated gestures in understandability & naturalness!
🔍 What’s inside?
✅ LLM-powered motion generation
✅ Human feedback to refine gestures (EMOTION++)
✅ 10 expressive gestures generated and evaluated (thumbs-up, stop, jazz-hands & more!)
📜 Read the full paper: https://t.co/UOYItwsEe0
🎬 Watch the video: https://t.co/O2VkbezW2o
Let’s bring robots closer to human-like interactions! What gestures would you like to see next? 👇
Huge kudos to the amazing team at Apple that made this work @Yuhan_Hu_, Nataliya Nechyporenko, @talking_kim, @waltertalbott, @jian_zhang_.
#Robotics #HRI #LLMs #HumanRobotInteraction #GestureGeneration #SocialRobots
@oier_mees Thanks @oier_mees ! Latency is an issue, but there are ways to improve. For example, a faster IK solver (Genesis? ;)) could help as we are running IK for each new hand pose
🚨Ever worried that your collected data cannot be used for training robot policies? You may need a Vision Pro.
🔥Check out this new AR-enabled, in-the-wild data collection method from our team here at Apple! Kudos to @ryan_hoque and everyone in the team!🎊