Excited to share RoboWorld 🤖
We roll out generalist robot policies from 4,186 real initial scenes, entirely inside a video world model with no robots, and the rankings hit Pearson r = 0.989 (Spearman ρ = 0.970) with the real RoboArena leaderboard. 🧵
[1/7]
Introducing HABIT — a large-scale robot manipulation dataset for human-present environments, where a person shares the workspace and interacts with the robot in every episode.
60 tasks · 10,563 episodes · 164 hours of rich human-robot interaction.
Toward robots that are not just capable, but safe and socially compatible around people.
https://t.co/kEtkqbuoIn
🧵[1/7]
🧵The way we benchmark generalist manipulation policies is broken.
• A single success rate can't capture a robot's capability.
• Overfitting demonstrations is not generalization.
We built EBench to fix both. EBench is a surgical diagnosis tool for robot foundation models. It provides not a leaderboard, but A CAT scan for your policy.
Here's what it reveals about π0, π0.5, Qwen-RobotManip (@Qwen), and the rest:
🤖 How can we learn a reliable policy across different robots and dynamics?
Excited to introduce SPACE, a framework that significantly improves cross-embodiment and cross-hardware (e.g., DROID) learning by addressing dynamics gaps, with execution-time adaptation.
📄 paper: https://t.co/zgByzVwFyz
📷 Project website: https://t.co/ZytiBGmMrY
🧵[1/n]
#Robotics #CrossEmbodiment .
Excited to introduce our work, Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy, which has been accepted to ICML 2026!
By leveraging flow-consistent values, we resolve the critical trade-off between expressivity and stability in Flow-based Reinforcement Learning.
Joint work at KAIST w/ @bkjeon1211 , @SeonghyeonYe , @kimin_le2 , @seo_minjoon .
Paper: https://t.co/J5FR6ac9OF
Code: https://t.co/1tObfGord1
Project Page: https://t.co/BF0JWAKSKk
Can a robot understand the nonverbal signals you give in real time — your pointing gestures, your gaze, the things you never put into words?
Meet EDITH: a framework that lets robots comprehend and act on human nonverbal signals.
https://t.co/giPBAA5w7j
🧵[1/n]
@KAIST_AI
#Robotics #HumanRobotInteraction #VLA #ProjectAria
This work was made possible by @meta_aria.
We used Project Aria smart glasses to build our hardware system, streaming the wearer's egocentric RGB and eye gaze to the robot in real time.
For data collection, a human actor wearing aria glasses and a robot teleoperator work together interactively — the human actor conveys intent through gaze and gestures while the teleoperator demonstrates the matching robot actions.
Huge thanks to @meta_aria for the research kit. 🙏
People feel the difference.
In a user study with 16 external participants, we confirm that EDITH significantly reduces effort of humans in conveying their intent to robot, compared to language-only model (p < 0.001).
[9/n]
EDITH is robust to messy, real human behavior.
When the user gets distracted mid-instruction (glancing at a phone, looking away), naive policies latch onto the wrong cue. EDITH tracks the actual intent and holds performance: merely 0.4% relative drop under distraction.
[8/n]