Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors!
We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :)
Play with our website to understand better: https://t.co/4Ruk5aipTk
🧵(1/7)
Real-world RL is still too brittle and data-hungry for long-horizon, contact-rich tasks.
We introduce Simulation Distillation (SimDist), which turns large-scale simulated experience into reusable world-model priors for rapid real-world adaptation.
By combining online planning with dynamics adaptation, SimDist achieves high success rates on tasks requiring precision, force, and reactivity.
Play with our interactive visualization to see for yourself: https://t.co/qFGNySxdAl
(1/n)
We present Compute Optimal Tokenization! 🔡
Common in LLM scaling works stick to one tokenizer, sweeping data/model size.
But what happens when we control the tokenizer’s compression rate (bytes/token)?
Here we sweep tokenizers, params, and data across compute budgets: [1/N]
Extremely simple and scalable way from @patrickhyin @TylerW24089 to generate dextrous behavior that transfers sim2real!!
Have been personally using it to generate expert policies that work from literally any reset position.
We’re releasing OmniReset, a framework for training robot policies using large-scale RL and diverse resets for contact-rich, dexterous manipulation.
OmniReset pushes the frontier of robustness and dexterity, without any reward engineering or demonstrations.
Try the policies yourself in our interactive simulator! https://t.co/3hW3nYx2vD
(1/N 🧵)
Animals can’t learn by being tele-operated. But, they do learn by observing and interacting with the world around them. So, why don’t robots learn this way?
Excited to release, “Planning from Observation and Interaction”, for real-world observational learning on robots! 🧵(1/12)
A reward model that works, zero-shot, across robots, tasks, and scenes?
Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories.
Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more!
🧵 (1/12)
🤖 Can a single robot policy manipulate diverse tools without ever seeing them before?
Introducing SimToolReal 🔨 : a generalist dexterous manipulation policy that transfers zero-shot sim→real to unseen tools + unseen tasks
All videos are 1x speed (60 Hz control) 🧵👇
For video generation in robotic applications, looking pretty is usually not enough.
Robot manipulation requires understanding how visual observations and 3D geometry evolve over time under agent actions, with temporal coherence and geometric consistency across camera views.
We study this challenge in our work (recently accepted by @iclr_conf ), 4D Video Generation for Robot Manipulation, which enforces multi-view 3D consistency via geometric supervision to generate spatio-temporally aligned videos.
Pretrained diffusion/flow policies are powerful — but brittle at deployment.
We introduce RFS, a data-efficient RL framework that:
• steers latent noise for global adaptation
• applies residual actions for precise local correction
Works in sim and real-world dexterous manipulation 🖐️🤖
👉📄 Paper + videos: https://t.co/HumWkk7MdL
Excited to put out new work - PolaRiS, a framework for scalable generalist policy evaluation!
The idea is simple - short videos of scenes get converted into high-fidelity simulation environments that match the real world. Then you can evaluate your favorite generalist policy on entirely unseen environments purely in simulation, without requiring real-world evaluations 🪇!
Simple right? - turns out getting it to really work needs some careful research and engineering. Let’s investigate! (1/8)
https://t.co/bQdb0aCEY3
Excited to introduce PolaRiS, a real-to-sim recipe for turning short real-world videos into high fidelity simulation environments for scalable and reliable zeroshot generalist policy evaluation.
https://t.co/nWcR6YuPf4
(1/N 🧵)
Happy to announce our neurips’25 paper, real world RL of active perception behaviors!
I am pretty excited about this project - I learned that real world robot RL is actually quite straightforward. Details below:
Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to *use all of this non-optimal data to robustify imitation learning* with minimal requirements beyond BC.
Key idea: use non-expert data to learn how to *recover* back to expert data with a minimal frills offline RL that works under sparse data coverage. Allows usage of *all* available data, not just expert data - never throw your data away!
Paper: https://t.co/gmP2V92DBL
Website: https://t.co/yi7fwPz4wi
A 🧵(1/10)
How can we create a single navigation policy that works for different robots in diverse environments AND can reach navigation goals with high precision?
Happy to share our new paper, "VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation"!
📜 Paper: https://t.co/XmyuBnrM1D
🌐 Website: https://t.co/Jt80tySWzQ
Punchline: World models == VQA (about the future)!
Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only predicted the semantic information necessary for decision-making?
Introducing Semantic World Models (SWM). Given an observation and an action sequence, SWMs cast modeling as answering textual questions about the future outcome resulting from the actions. Recasting world modeling as a VQA problem lets us directly leverage the pretrained knowledge and machinery of VLMs for generalizable modeling. We had a lot of fun thinking about how this work helps connect these two seemingly very different fields of study - VLMs and world models! 🧵(1/6)
Paper: https://t.co/KIrRG2JO1a
Fun demo: https://t.co/leogQBvcO0
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?
Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!
https://t.co/1t2fsW7jyL🧵
Are you worried that an LLM you trained could be stolen and misused by mysterious masked men 🥷? Our work (now a #NeurIPS2025 Spotlight 💫) can help you detect such unauthorized use. As a side-quest, we also analyse memorization and forgetting in LLMs 🧵(1/11).
🤔 How do we train AI models that surpass their teachers?
🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯
The secret? Learn from the *differences* in weak data pairs!
📜 https://t.co/dw1QeQackx
🧵 below
If you visited the @uwcherryblossom, did you “spot” an unusual visitor among the blooms? Researchers in the @UW#UWAllen’s #Robotics group recently took advantage of some nice weather to take our @BostonDynamics robot dog for a stroll around campus. #AI 1/4
Constructing interactive simulated worlds has been a challenging problem, requiring considerable manual effort for asset creation and articulation, and composing assets to form full scenes. In our new work - DRAWER, we made the process of creating scenes in simulation as simple as taking a video of the scene and out comes a high-quality, fully interactive environment in simulation. No human simulation designer involved!
https://t.co/JVA5Nap2fe
A 🧵(1/7)