Flexible Locomotion Learning with Diffusion Model Predictive Control
Excited to share that our paper has been accepted to #ICRA2026@ieee_ras_icra!
A diffusion-planning framework for flexible real-world quadruped locomotion. Instead of learning a fixed RL policy or relying on hand-crafted dynamics for MPC, we train a diffusion trajectory prior that jointly predicts future states and actions.
Key Ideas:
Diffusion-MPC: A diffusion planner unlocks flexible locomotion through test-time reward and constraint adaptation
Interactive reward-weighted finetuning enables continual behavior refinement from online environment feedback
Real-world deployment on Unitree Go2 with efficient and adaptive planning
The same planner can adapt at test time to height changes, posture/joint constraints, balancing under external disturbances, energy-aware locomotion, and zero-shot outdoor walking on grass and slopes.
🌐Homepage: https://t.co/TSXUZAL5nT
📖Paper: https://t.co/de4dUZf5AA
🔗Code: https://t.co/NLFB1alhWJ
This work is by @RunhanH, Haldun Balim, @hankyang94 , and @du_yilun.
#ICRA2026 #Robotics #LeggedRobots #RobotLearning #DiffusionModels #MPC #MachineLearning
I'm giving a spotlight talk tomorrow, June 4, 10am in Room 2A.
Sharing the latest series of 𝗿𝗼𝗯𝗼𝘁 𝗰𝗼𝗱𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁 works we built at UC Berkeley / NVIDIA GEAR.
https://t.co/VEQNa2OH7X
Tomorrow (Thursday morning), 9:00–10:30 AM, I'll be at Poster 326 to present our work on a new imitation learning framework, MIMIC, for training a sidewalk autopilot. Stop by if you're attending #ICRA2026 in Vienna:
MIMIC (Multi-scale IMItation with Corrective expansions) trains sidewalk autopilots from teleoperation data by expanding corrective behaviors and visual diversity through generative augmentation.
Project page: https://t.co/aZzDhyFcOG
Mobile manipulation is not just putting arms on wheels. It introduces a different class of challenges, such as partial observability, whole-body interface design.
However, researchers are often held back by hardware setup before they can get to the actual research problems.
I recently wrote a tutorial, https://t.co/tZDCeXGWzE to make the process easier.
With support from hardware vendors, you can now purchase an out-of-box hardware kit directly, without having to build everything from scratch. We also provide a plug-and-play codebase for the robot control, teleoperation, data collection, model training, and inference.
Simple Mobile aims to make mobile manipulators more accessible, save you time, and help you get to the **research part** faster.
Can we learn whole-body mobile manipulation directly from human demonstrations?
Introducing Whole-Body Mobile Manipulation Interface (HoMMI)
Egocentric + UMI, 0 teleop -> bimanual & whole-body manipulation, long-horizon navigation, active perception
https://t.co/CcZ9ZwfuFr
Meet BFM-Zero: A Promptable Humanoid Behavioral Foundation Model w/ Unsupervised RL👉 https://t.co/3VdyRWgOqb
🧩ONE latent space for ALL tasks
⚡Zero-shot goal reaching, tracking, and reward optimization (any reward at test time), from ONE policy
🤖Natural recovery & transition
🔥Excited to share the first released work from our IEI lab! Congrats to @AnteaWu 🎉
This work is motivated by the lack of quantitative evaluation for physics alignment in video world models. With tools like MegaSam and CoTracker, we can directly reconstruct dynamic 3D scenes, enabling quantitative evaluation of physical alignment.
Both code and data are released — feel free to try it out! It should work, but if it doesn’t, contact @AnteaWu directly : )
Excited to share our work with @du_yilun! We use compositional generation to improve T2I diffusion models' generalization to longer text prompts.
Our poster will be at @iclr_conf 4/23 10:30 am - 1:00 pm. Come and have a chat on at P4 #3011 Riocentro!!
🏠 https://t.co/9sJ6uPeYa9
Check out our at #ICRA2026 on building flexible locomotion systems through diffusion-based MPC !
Our generative MPC approach allows us to rapidly adapt locomotion policies to constraints such as height, terrain, and joint angles by simply changing the optimized objective.
A search framework for stronger LLM reasoning – Bidirectional Evolutionary Search, or BES by @Harvard and @MIT
It combines:
- forward search to create and improve candidate solutions
- backward search to breaks the task into checkable sub-goals
+ BES can recombine parts of different candidate trajectories using evolution-style operators → Combinationб Deletionб Translocationб Crossover.
This helps to explore solutions that ordinary rollouts are unlikely to reach.
Due to backward the system can recognize partial progress even before the final answer is correct.
The most notable results:
- on MuSiQue multi-hop reasoning, BES improved Llama-3.2-3B-Instruct from 4.0% to 7.0% accuracy (GRPO degraded performance and Tree-GRPO barely helped)
- BES outperformed open-source evolutionary frameworks – OpenEvolve, GEPA, and ShinkaEvolve – on circle packing and Heilbronn convex optimization.
"Self Improving Language Models with Bidirectional Evolutionary Search"
Most LLM search still works by sampling more rollouts or extending one path at a time.
This paper's bidirectional evolutionary search does it in a smarter way.
It breaks the task backward into smaller verifiable goals, while evolving solutions forward by mixing useful parts from different attempts.
This lets the model find answers that normal sampling and tree search are unlikely to reach.
The gives better post-training and stronger test-time search on hard reasoning and open problem solving tasks.
🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer?
Best-of-N (e.g., GRPO) and tree search share two limitations:
🔻 Verification signals are sparse
🔻 Candidates stay within the model's own distribution
We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition.
✅ Works for both post-training and inference.
Spatial understanding is important to moving around in complex environments and is a huge part of the challenge of generalizing to new scenes. Most world models, however, largely ignore this spatial dimension, focusing on 2D images.
Not PointWorld, though. PointWorld is a 3D world model trained from real and simulated data which can perform a wide variety of manipulation tasks on a real robot, including grasping or handling articulated objects, all without any additional fine tuning. @wenlong_huang joins us to tell us more about what makes this work and how it’s different from other world models.
Watch Episode #83 of RoboPapers, with @chris_j_paxton and @DJiafei, to learn more!
This is THE moment of Physical AI!
We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀
- Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions.
- It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.”
- Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks.
Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate.
The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community.
Welcome to the era of Physical AI.
HuggingFace: https://t.co/QW5h5pIWWM
Project Website: https://t.co/Jppa0gkn16
Code: https://t.co/aJgaLm5BaG
In the last couple of months, we have witnessed significant advances in Industry-scale World Models. Yet, for the broader community, the gap between reading about these models and deploying them remains disappointingly wide.
Today we're releasing Nano World Models: a minimalist, batteries-included repo for advancing world model science.
🧵 (1/9)
Flexible Locomotion Learning with Diffusion Model Predictive Control
Excited to share that our paper has been accepted to #ICRA2026@ieee_ras_icra!
A diffusion-planning framework for flexible real-world quadruped locomotion. Instead of learning a fixed RL policy or relying on hand-crafted dynamics for MPC, we train a diffusion trajectory prior that jointly predicts future states and actions.
Key Ideas:
Diffusion-MPC: A diffusion planner unlocks flexible locomotion through test-time reward and constraint adaptation
Interactive reward-weighted finetuning enables continual behavior refinement from online environment feedback
Real-world deployment on Unitree Go2 with efficient and adaptive planning
The same planner can adapt at test time to height changes, posture/joint constraints, balancing under external disturbances, energy-aware locomotion, and zero-shot outdoor walking on grass and slopes.
🌐Homepage: https://t.co/TSXUZAL5nT
📖Paper: https://t.co/de4dUZf5AA
🔗Code: https://t.co/NLFB1alhWJ
This work is by @RunhanH, Haldun Balim, @hankyang94 , and @du_yilun.
#ICRA2026 #Robotics #LeggedRobots #RobotLearning #DiffusionModels #MPC #MachineLearning