Excited to be at NeurIPS tomorrow! A set of things I will be doing while there you should stop by for: Monday - mentoring at WiML on the topic of reinforcement learning 1/n
In domains with sparse rewards, reward shaping is well known to speed up learning by providing a dense learning signal.
We introduce an alternate method, f-Policy Gradients (https://t.co/mNExlKHf1r), to obtain optimal policies through distribution matching. (1/n)
#NeurIPS2023