We’re releasing OmniReset, a framework for training robot policies using large-scale RL and diverse resets for contact-rich, dexterous manipulation.
OmniReset pushes the frontier of robustness and dexterity, without any reward engineering or demonstrations.
Try the policies yourself in our interactive simulator! https://t.co/3hW3nYx2vD
(1/N 🧵)
Real-world RL is still too brittle and data-hungry for long-horizon, contact-rich tasks.
We introduce Simulation Distillation (SimDist), which turns large-scale simulated experience into reusable world-model priors for rapid real-world adaptation.
By combining online planning with dynamics adaptation, SimDist achieves high success rates on tasks requiring precision, force, and reactivity.
Play with our interactive visualization to see for yourself: https://t.co/qFGNySxdAl
(1/n)
@Saketh_Vaishya We use 4 L40S GPUs to train our RL policy for a single task. Take a look at our documentation for more details about the compute we use at every step: https://t.co/5qCW7Fo9hw
We’re releasing OmniReset, a framework for training robot policies using large-scale RL and diverse resets for contact-rich, dexterous manipulation.
OmniReset pushes the frontier of robustness and dexterity, without any reward engineering or demonstrations.
Try the policies yourself in our interactive simulator! https://t.co/3hW3nYx2vD
(1/N 🧵)
I would say First-Try Success Rate (Real) vs. Policy Success Rate (Sim) is a more fair comparison for the sim2real gap in these experiments. Because our policies are trained with broad state coverage, they can recover from failures and retry until success. You can see this behavior in the first ~20 seconds of the full, uncut evaluation videos at the bottom of our website: https://t.co/BEqLn3nhYA
@nikamanth Nice catch, Naveen! This is a typo on our end. The real experiments are fully zero-shot sim2real, with no co-training or finetuning on real data.
Thank you! Yes, on-policy distillation would likely help a lot. The main limitation for us was compute. With 3 high-resolution cameras and high-fidelity rendering, we could only fit ~16–32 environments per 4090, which is orders of magnitude fewer than the 65K+ environments we use for state-based RL. Making RGB DAgger or RL more compute-efficient is definitely a very interesting direction to explore.
@joao_p_araujo Thank you! The easier tasks take 8 hours to train on 4 L40S GPUs. The harder ones can take take as long as 32 hours. All our training curves are available to see in UWLab! https://t.co/6qAKZQmv0C
@YouJiacheng That’s right! It turns out if you can reset the robot to a dense amount of interesting states kinematically (not necessarily just true initial states), RL will figure out the dynamics to maximize reward and achieve its goal
Excited to share the project that has surprised me the most in the last year!
Large-scale RL in simulation, no demos and no reward engineering can solve dynamic, dexterous and contact rich tasks. The learned behaviors are reactive, forceful and use the environment for recovery in ways that are extremely challenging to bake in or teleoperate!
You can play with the policies yourself to see: https://t.co/TCc4hb2baV
And, the learned behavior transfers to real world robots from RGB camera inputs!
So what’s the trick - using simulator resets carefully! Let’s unpack (1/10)
We’re building UWLab, a shared ecosystem for training robot policies in simulation and transferring them to the real world, built on Isaac Lab.
This includes the full OmniReset codebase, along with tasks, algorithms, and deployment in one clean, modular stack: https://t.co/PLX1fzPiSU
Excited to introduce PolaRiS, a real-to-sim recipe for turning short real-world videos into high fidelity simulation environments for scalable and reliable zeroshot generalist policy evaluation.
https://t.co/nWcR6YuPf4
(1/N 🧵)
How can we help *any* image-input policy generalize better to visual and semantic variations?
👉 Meet PEEK 🤖 — a framework that uses VLMs to decide *where* to look and *what* to do, so downstream policies — from ACT, 3D-DA, or even π₀ — generalize more effectively!