Introducing HABIT — a large-scale robot manipulation dataset for human-present environments, where a person shares the workspace and interacts with the robot in every episode.
60 tasks · 10,563 episodes · 164 hours of rich human-robot interaction.
Toward robots that are not just capable, but safe and socially compatible around people.
https://t.co/kEtkqbuoIn
🧵[1/7]
Really appreciate this thoughtful write-up on our HABIT dataset. We believe human presence is an important new axis for robot learning as robots move into human-shared environments. Thanks for reading and sharing our work!
Most manipulation datasets are collected in isolation. Most manipulation tasks are not.
The demos you see online prove the robot can do the task. They don't prove it can do the task with a person in the workspace. Those are different problems, and most training data only covers one of them.
KAIST just released HABIT, and it's built to close exactly that gap.
10,563 episodes. A co-present human in every single one. Three roles: Collaborator, Coworker, Supervisor. Each role isolates a different failure mode: precondition violations, collisions, gesture-following errors.
What I like here is the method. They didn't just put a human in frame and call it interactive data. They defined the roles first, then structured collection to force the specific behaviour each role demands: yielding under Coworker, gesture grounding under Supervisor, spatiotemporal sync under Collaborator. That's the difference between data that looks realistic and data that's actually diagnostic.
The results back it up. Fine-tuning π0.5 on HABIT beat a matched robot-only baseline on every comparable task, with the biggest gains on Coworker, where reactive yielding resolves path conflicts. GR00T N1.6 shows the same pattern at lower absolute numbers, which tells you the gain is coming from the dataset, not the architecture. Mid-training on HABIT compounds too: 100 task-specific demonstrations after mid-training beat 200 demonstrations of direct fine-tuning on shelf-cleaning.
I've said this before about sample efficiency and I'll say it again about coordination: this isn't a scale problem, it's a data problem. If human presence changes behaviour this cleanly, a human-absent dataset isn't incomplete, it's blind to the failure modes that actually matter once the robot leaves the cell.
Robot-only pre-training gets you competence. It does not get you coordination. The next generation of training data needs an independent human moving through the scene unpredictably, not another 10,000 episodes of the same skill performed alone.
[Paper and dataset link in comments]
Congratulations to the team at KAIST on the release.
#Robotics #PhysicalAI #RobotLearning
Excited to share RoboWorld 🤖
We roll out generalist robot policies from 4,186 real initial scenes, entirely inside a video world model with no robots, and the rankings hit Pearson r = 0.989 (Spearman ρ = 0.970) with the real RoboArena leaderboard. 🧵
[1/7]
To add a bit more detail to Kimin’s answer, we evaluated OOD generalization across body silhouette and clothing color (Appendix E), since human appearance is one of the biggest sources of distribution shift in HRI. During data collection, we intentionally increased clothing diversity by having operators change shirts roughly every hour, so multiple clothing colors appeared even within the same task. As Kimin mentioned, the performance drop is modest, and the models still generalize reasonably well across these variations.
Thanks for the great question!
Introducing HABIT — a large-scale robot manipulation dataset for human-present environments, where a person shares the workspace and interacts with the robot in every episode.
60 tasks · 10,563 episodes · 164 hours of rich human-robot interaction.
Toward robots that are not just capable, but safe and socially compatible around people.
https://t.co/kEtkqbuoIn
🧵[1/7]
📄 Paper: https://t.co/6ppr4dudoR
🌐 Project page: https://t.co/kEtkqbuoIn
This work was done with an amazing team at @config_inc. Huge thanks to all my collaborators at Config and KAIST. I'm especially grateful to @kimin_le2 for the guidance and support throughout this work. 🙏
[7/7]
Mid-training on HABIT improves both sample efficiency and final performance on downstream interaction tasks. HABIT serves as a strong prior that transfers to new human-robot interaction tasks.
[6/7]
https://t.co/P81zD5RKYv
Large-scale datasets have driven remarkable progress in general-purpose robot policies. But they're almost always collected with the robot as the sole agent — no human in the scene.
So policies can do a task in isolation, yet fall short where robots are actually deployed: homes, factories, shared workspaces. They never learn to hand over, yield to a reaching hand, or avoid a co-present person — because those behaviors can't be demonstrated without a human there.
[2/7]
🤖 How can we learn a reliable policy across different robots and dynamics?
Excited to introduce SPACE, a framework that significantly improves cross-embodiment and cross-hardware (e.g., DROID) learning by addressing dynamics gaps, with execution-time adaptation.
📄 paper: https://t.co/zgByzVwFyz
📷 Project website: https://t.co/ZytiBGmMrY
🧵[1/n]
#Robotics #CrossEmbodiment .
Can a robot understand the nonverbal signals you give in real time — your pointing gestures, your gaze, the things you never put into words?
Meet EDITH: a framework that lets robots comprehend and act on human nonverbal signals.
https://t.co/giPBAA5w7j
🧵[1/n]
@KAIST_AI
#Robotics #HumanRobotInteraction #VLA #ProjectAria
Most robot data is collected in human-free environments — but the real world is not.
We're closing that gap by collecting human-present data.
Our models learn to naturally pause, yield, and collaborate!
Blog post: https://t.co/U1jtFLVXDr
🧵1/N
Most robot training data assumes no humans in the workspace — but the real world does.
We're building a large-scale Human-Robot Collaboration dataset to close this gap.
Full paper and dataset coming soon!
https://t.co/LqYj1hRKgI
Most robot data is collected in human-free environments — but the real world is not.
We're closing that gap by collecting human-present data.
Our models learn to naturally pause, yield, and collaborate!
Blog post: https://t.co/U1jtFLVXDr
🧵1/N
Hello world 🤖👋🏻—We are Config.
Today, we’re excited to share a preview (🔗 https://t.co/M6mnlt6waf) of what we’ve been building. Our mission is to make robots capable of reliably performing two-handed tasks across diverse real-world settings materially more cost- and time-efficient to deploy.