Brett Barkley @bebark99 - Twitter Profile

Pinned Tweet

8 months ago

(1/n) With over 1,300 citations, MBPO is often cited as proof that model based RL beats model free methods. In https://t.co/xq3WXslh67 we showed it often completely fails in DeepMind Control. In our new work, Fixing That Free Lunch (FTFL), we explain why and make it succeed.

bebark99's tweet photo. (1/n)
With over 1,300 citations, MBPO is often cited as proof that model based RL beats model free methods. In https://t.co/xq3WXslh67 we showed it often completely fails in DeepMind Control. In our new work, Fixing That Free Lunch (FTFL), we explain why and make it succeed. https://t.co/VBnuAq0wfK

2

19

4

6

5K

Brett Barkley @bebark99

7 months ago

@aurielws I'll be at NeurIPS and would love to chat! https://t.co/YDrN4xJRBr https://t.co/F4AngwpRuR

0

2

0

700

Brett Barkley @bebark99

8 months ago

(12/12) In summary, FTFL turns MBPO’s synthetic-data failures into successes and shows how even seemingly similar environment structure can shape algorithmic reliability. Full paper: https://t.co/UjkUVmvg42

0

2

0

170

Brett Barkley @bebark99

8 months ago

(1/n) With over 1,300 citations, MBPO is often cited as proof that model based RL beats model free methods. In https://t.co/xq3WXslh67 we showed it often completely fails in DeepMind Control. In our new work, Fixing That Free Lunch (FTFL), we explain why and make it succeed.

2

19

4

6

5K

Brett Barkley @bebark99

8 months ago

(11/n) FTFL shows that understanding when and why algorithms fail is as important as improving their averages. We hope this motivates the RL community to build mappings between environment structure and algorithmic choices as a step toward more generally reliable methods.

1

0

202

Brett Barkley @bebark99

about 1 year ago

@GuanyaShi @Caltech @lschmidt3 Totally resonates with our work (arXiv:2412.14312), we show that Dyna-style tweaks - dominant in Gym - consistently hurt performance in DMC despite both using Mujoco. Adding them to off-policy makes it worse, not better. Maybe we’ve overfit to Gym more than we realized.

bebark99's tweet photo. @GuanyaShi @Caltech @lschmidt3 Totally resonates with our work (arXiv:2412.14312), we show that Dyna-style tweaks - dominant in Gym - consistently hurt performance in DMC despite both using Mujoco. Adding them to off-policy makes it worse, not better. Maybe we’ve overfit to Gym more than we realized. https://t.co/Gcz38KBBOC

0

5

0

1

115

Brett Barkley @bebark99

over 1 year ago

(10/10) In summary, Open AI Gym and DMC are equally conventional testbeds that share a common physics backend (Mujoco). There is no 'good' reason for MBPO and ALM to largely fail in DMC, but they do. We encourage readers to check out our paper for more: https://t.co/xq3WXskJgz

0

2

0

181

Brett Barkley @bebark99

over 1 year ago

You might be surprised to learn that modern RL favors Dyna-style model-based algorithms for their sample efficiency, yet they can both require up to 40x more wall clock time to train and significantly underperform simple model-free methods across diverse benchmarks.

bebark99's tweet photo. You might be surprised to learn that modern RL favors Dyna-style model-based algorithms for their sample efficiency, yet they can both require up to 40x more wall clock time to train and significantly underperform simple model-free methods across diverse benchmarks. https://t.co/vXX2KiGsjq

2

20

1

7

1K

Brett Barkley @bebark99

over 1 year ago

(9/n) Not only that, but at the time of this post MBPO has >1000 citations and a reproducibility study at Neurips. Despite this, only one paper has noted this performance gap, and it was only noted across hopper tasks in Gym and DMC.

1

3

0

191

Brett Barkley

@bebark99

Last Seen Users on Sotwe

Trends for you

Most Popular Users