TalkRL Podcast is All Reinforcement Learning, All the Time.
Follow for interviews with brilliant folks from across the world of RL.
Host @robinc. DMs open.
@kjaved_@how_uhh@velocizapkar@danijarh Agree this seems to defeat the purpose of small sample benchmarks!
Still hoping there is another solution to this issue other than slow envs... seems a case of goodhearts law
@kjaved_@how_uhh@velocizapkar@danijarh Purposely wanting slow envs due to wanting sample efficient algos seems a bit like throwing baby out with bathwater. Then those with most compute have more advantage.
Why not just pay attention to sample complexity/hp sensitivity, and also have fast envs?
E73: Danijar Hafner on Dreamer v4
@danijarh (ex-@GoogleDeepMind RS) on offline world models for safe robotics, Shortcut Forcing for fast diffusion video models, outperforming OpenAI’s VPT with 100× less data, his “APD” theory unifying exploration and empowerment, and more!
@sirbayes@alexinch_ai@karpathy@RichardSSutton@dwarkesh_sp Its interesting how central this is to current paradigm, yet how non obvious to most (including me).
Has this formulation been more spelled out somewhere by you or others?
@CsabaSzepesvari@karpathy My personal hot take is very different:
1. RL as a family of conceptual frameworks, is timeless.
2. Frustrations with modern deep RL algo performance, are mostly due to limitations of deep learning function approx
tldr; Give RL FAs that generalize better (plus algos) :D