Animals can’t learn by being tele-operated. But, they do learn by observing and interacting with the world around them. So, why don’t robots learn this way?
Excited to release, “Planning from Observation and Interaction”, for real-world observational learning on robots! 🧵(1/12)
🤖Adding new RL algorithms to LeRobot just got much easier.
Demo: HIL-SERL training with a SAC-based RL algorithm on an SO-100 for a hole-in-hand peg-in-hole task.
Sparse reward, only 30 offline demos mixed with live robot experience, and ~1 hour of online training with human interventions only when the policy fails.
The bottom graph tracks intervention rate: high at the start, steadily dropping as the policy improves.
The refactor separates algorithm logic from training infrastructure:
• RLAlgorithm owns learning logic
• RLTrainer handles orchestration
• DataMixer combines rollouts, demos, interventions, and future data sources
Adding an RL algorithm now looks much closer to adding a policy: one algorithm file, one config, one registry entry.
SAC is first. RLT, RECAP, ConRFT, QC-FQL, DSRL, and VLA RL fine-tuning next!
@Thom_Wolf@ClementDelangue
To my two followers if you’re at ICLR, come find me at Pavilion 4, Poster #4704! Will be talking about real-world model-based Inverse Reinforcement Learning and anything tangentially related.
Robots can learn from demonstrations… but they often learn the wrong thing.
A robot may copy how you move — without understanding what actually matters for the task.
We introduce Masked IRL, a method that uses language + demonstrations to learn what parts of the world actually matter for reward learning. As a result, the robot can use its data more efficiently and learn 5x faster.
📄 Paper:
Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language (ICRA 2026)
https://t.co/H2w0CAGutm
🌐 Project page:
https://t.co/zHl81fic8A
🧵 1/8
I’ve finally reached peak academic narcissism: according to this email, my primary research specialty is "Furong Huang." 🫠
Not only am I the subject of my own research, but I’ve also apparently mastered "distributed systems", because, as the logic goes, we "distributed" images to participants in our watermarking competition? 🤣 I love a good reach, but this is a yoga pose.
Jokes aside, to the students who are actually interested: I am truly sorry. Your thoughtful, genuine emails are being buried under a mountain of this agent-generated noise. If you want to stand out, please:
• Ditch the templates. If a bot could have written it, a human probably won't read it.
• Be specific. Mention a specific finding or a question you had about a paper, not just the title.
• Show the "Why." How does your specific background actually bridge to our work? (And no, "distributing images" doesn't count as systems experience!)
Final thought on the AI Agent Era: Attention is our most precious currency, perhaps the only truly finite resource we have left. While agents make it "free" to generate content, they make it incredibly expensive to consume it.
We have to start protecting our own attention and, more importantly, respecting the attention of others. If you don't respect someone's time enough to proofread an email, don't expect them to invest their time in your career.
#AcademicTwitter #ML #GenAI #AttentionEconomy
We’re releasing OmniReset, a framework for training robot policies using large-scale RL and diverse resets for contact-rich, dexterous manipulation.
OmniReset pushes the frontier of robustness and dexterity, without any reward engineering or demonstrations.
Try the policies yourself in our interactive simulator! https://t.co/3hW3nYx2vD
(1/N 🧵)
This work would not be possible without: Siyang Shen, @RohanBaijal@harine_ravi, @bxtbold, Kevin Huang, Sanghun Jung, and Byron Boots.
Find more videos on the project website.
🌐Project: https://t.co/sZ8So0X8SC
📜Paper: https://t.co/K5rJHGzQlB
🧑💻Code: https://t.co/GiL7UqUycf
Animals can’t learn by being tele-operated. But, they do learn by observing and interacting with the world around them. So, why don’t robots learn this way?
Excited to release, “Planning from Observation and Interaction”, for real-world observational learning on robots! 🧵(1/12)
Scaling up MPAIL, this work is a second step in a vision towards enabling any embodiment (not only those with thousands of hours of action-labeled and prior data) to learn directly in the real world, advancing robots ever so slightly towards animal-like learning and adaptation.
A reward model that works, zero-shot, across robots, tasks, and scenes?
Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories.
Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more!
🧵 (1/12)
How can we create a single navigation policy that works for different robots in diverse environments AND can reach navigation goals with high precision?
Happy to share our new paper, "VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation"!
📜 Paper: https://t.co/XmyuBnrM1D
🌐 Website: https://t.co/Jt80tySWzQ
Long Range Navigator (LRN) 🧭— an approach to extend planning horizons for off-road navigation given no prior maps. Using vision LRN makes longer-range decisions by spotting navigation frontiers far beyond the range of metric maps.
https://t.co/uapFK13LHh