Introducing our new work DMO: Decoupled Model-based policy Optimization! First-order gradient RL that unrolls trajectories with high-fidelity sims & computes gradients via learned models.
Paper & demos: https://t.co/SrLCn1mdVA
#CoRL2025 w/ @Rk4342R
@KyleMorgenstein@Rk4342R Reducing the size of the replay buffer generally impeded learning (however, for the Go2 walking exp, it was beneficial to have 1e5 instead of 1e6).
@KyleMorgenstein@Rk4342R "coming from PPO it’s shocking to see so few get such good performance" -> the price, however, for now, is the need for a differentiable reward function.
@KyleMorgenstein@Rk4342R Thank you! For the value function, we use regular TD-lambda. For the dynamics model, the std for each feature in the obs is different. Depending on the task, I also recommend using design choices I to V of the "4.2 Design Choices" section in https://t.co/PHy5vUVp6g.
First Order Model-Based RL through Decoupled Backpropagation (DMO)
https://t.co/1o4BjruhZc
Simulation rollouts, learned model for first order optimization