Jaehwi Song @j__aehwi - Twitter Profile

Pinned Tweet

3 days ago

Introducing HABIT — a large-scale robot manipulation dataset for human-present environments, where a person shares the workspace and interacts with the robot in every episode. 60 tasks · 10,563 episodes · 164 hours of rich human-robot interaction. Toward robots that are not just capable, but safe and socially compatible around people. https://t.co/kEtkqbuoIn 🧵[1/7]

5

160

28

102

13K

Jaehwi Song

@j__aehwi

1 day ago

Really appreciate this thoughtful write-up on our HABIT dataset. We believe human presence is an important new axis for robot learning as robots move into human-shared environments. Thanks for reading and sharing our work!

Stephen James

@stepjamUK

1 day ago

Most manipulation datasets are collected in isolation. Most manipulation tasks are not. The demos you see online prove the robot can do the task. They don't prove it can do the task with a person in the workspace. Those are different problems, and most training data only covers one of them. KAIST just released HABIT, and it's built to close exactly that gap. 10,563 episodes. A co-present human in every single one. Three roles: Collaborator, Coworker, Supervisor. Each role isolates a different failure mode: precondition violations, collisions, gesture-following errors. What I like here is the method. They didn't just put a human in frame and call it interactive data. They defined the roles first, then structured collection to force the specific behaviour each role demands: yielding under Coworker, gesture grounding under Supervisor, spatiotemporal sync under Collaborator. That's the difference between data that looks realistic and data that's actually diagnostic. The results back it up. Fine-tuning π0.5 on HABIT beat a matched robot-only baseline on every comparable task, with the biggest gains on Coworker, where reactive yielding resolves path conflicts. GR00T N1.6 shows the same pattern at lower absolute numbers, which tells you the gain is coming from the dataset, not the architecture. Mid-training on HABIT compounds too: 100 task-specific demonstrations after mid-training beat 200 demonstrations of direct fine-tuning on shelf-cleaning. I've said this before about sample efficiency and I'll say it again about coordination: this isn't a scale problem, it's a data problem. If human presence changes behaviour this cleanly, a human-absent dataset isn't incomplete, it's blind to the failure modes that actually matter once the robot leaves the cell. Robot-only pre-training gets you competence. It does not get you coordination. The next generation of training data needs an independent human moving through the scene unpredictably, not another 10,000 episodes of the same skill performed alone. [Paper and dataset link in comments] Congratulations to the team at KAIST on the release. #Robotics #PhysicalAI #RobotLearning

3

29

2

9

3K

0

1

0

100

j__aehwi retweeted

byeongguk jeon

@bkjeon1211

2 days ago

Excited to share RoboWorld 🤖 We roll out generalist robot policies from 4,186 real initial scenes, entirely inside a video world model with no robots, and the rankings hit Pearson r = 0.989 (Spearman ρ = 0.970) with the real RoboArena leaderboard. 🧵 [1/7]

1

50

15

22

6K

Jaehwi Song

@j__aehwi

2 days ago

To add a bit more detail to Kimin’s answer, we evaluated OOD generalization across body silhouette and clothing color (Appendix E), since human appearance is one of the biggest sources of distribution shift in HRI. During data collection, we intentionally increased clothing diversity by having operators change shirts roughly every hour, so multiple clothing colors appeared even within the same task. As Kimin mentioned, the performance drop is modest, and the models still generalize reasonably well across these variations. Thanks for the great question!

0

20

Jaehwi Song

@j__aehwi

3 days ago

Introducing HABIT — a large-scale robot manipulation dataset for human-present environments, where a person shares the workspace and interacts with the robot in every episode. 60 tasks · 10,563 episodes · 164 hours of rich human-robot interaction. Toward robots that are not just capable, but safe and socially compatible around people. https://t.co/kEtkqbuoIn 🧵[1/7]

5

160

28

102

13K

Jaehwi Song

@j__aehwi

3 days ago

📄 Paper: https://t.co/6ppr4dudoR 🌐 Project page: https://t.co/kEtkqbuoIn This work was done with an amazing team at @config_inc. Huge thanks to all my collaborators at Config and KAIST. I'm especially grateful to @kimin_le2 for the guidance and support throughout this work. 🙏 [7/7]

0

4

0

1

220

Jaehwi Song

@j__aehwi

3 days ago

Mid-training on HABIT improves both sample efficiency and final performance on downstream interaction tasks. HABIT serves as a strong prior that transfers to new human-robot interaction tasks. [6/7]

j__aehwi's tweet photo. Mid-training on HABIT improves both sample efficiency and final performance on downstream interaction tasks. HABIT serves as a strong prior that transfers to new human-robot interaction tasks.

[6/7] https://t.co/ZLZpCNdlKL

1

0

203

Jaehwi Song

@j__aehwi

3 days ago

https://t.co/P81zD5RKYv Large-scale datasets have driven remarkable progress in general-purpose robot policies. But they're almost always collected with the robot as the sole agent — no human in the scene. So policies can do a task in isolation, yet fall short where robots are actually deployed: homes, factories, shared workspaces. They never learn to hand over, yield to a reaching hand, or avoid a co-present person — because those behaviors can't be demonstrated without a human there. [2/7]

0

38

j__aehwi retweeted

Haeone Lee @ ICML

@Haeone_Lee

8 days ago

🤖 How can we learn a reliable policy across different robots and dynamics? Excited to introduce SPACE, a framework that significantly improves cross-embodiment and cross-hardware (e.g., DROID) learning by addressing dynamics gaps, with execution-time adaptation. 📄 paper: https://t.co/zgByzVwFyz 📷 Project website: https://t.co/ZytiBGmMrY 🧵[1/n] #Robotics #CrossEmbodiment .

1

44

14

30

10K

j__aehwi retweeted

Dongjun Lee

@dongjunlie

23 days ago

Can a robot understand the nonverbal signals you give in real time — your pointing gestures, your gaze, the things you never put into words? Meet EDITH: a framework that lets robots comprehend and act on human nonverbal signals. https://t.co/giPBAA5w7j 🧵[1/n] @KAIST_AI #Robotics #HumanRobotInteraction #VLA #ProjectAria

1

65

18

29

9K

j__aehwi retweeted

Config @config_inc

3 months ago

Most robot data is collected in human-free environments — but the real world is not. We're closing that gap by collecting human-present data. Our models learn to naturally pause, yield, and collaborate! Blog post: https://t.co/U1jtFLVXDr 🧵1/N

5

82

15

47

431K

Jaehwi Song

@j__aehwi

3 months ago

Most robot training data assumes no humans in the workspace — but the real world does. We're building a large-scale Human-Robot Collaboration dataset to close this gap. Full paper and dataset coming soon! https://t.co/LqYj1hRKgI

Config @config_inc

3 months ago

Most robot data is collected in human-free environments — but the real world is not. We're closing that gap by collecting human-present data. Our models learn to naturally pause, yield, and collaborate! Blog post: https://t.co/U1jtFLVXDr 🧵1/N

5

82

15

47

431K

0

3

0

1

760

j__aehwi retweeted

Config @config_inc

4 months ago

Hello world 🤖👋🏻—We are Config. Today, we’re excited to share a preview (🔗 https://t.co/M6mnlt6waf) of what we’ve been building. Our mission is to make robots capable of reliably performing two-handed tasks across diverse real-world settings materially more cost- and time-efficient to deploy.

17

92

10

39

26K

Jaehwi Song

@j__aehwi

5 months ago

@CVPR Shoud we need to use newly assigned paper id on rebuttal submission?

1

0

477

Jaehwi Song

@j__aehwi

Last Seen Users on Sotwe

Trends for you

Most Popular Users