There’s huge potential in using ‘demonstrations’ from other agents with different goals: to understand which features & dynamics of the environment *might* be important to you; and to borrow from others' behaviours only where they are useful for you.
👽 PsiPhi-learning 👽 (long talk #ICML) https://t.co/TA7gDtEHak shows how an agent can use data from the behavior of other agents with diverse goals: to infer their intentions and fulfill its own! 🧵
Excited to announce that our work on “Discovering state-of-the-art RL algorithms” is finally published in @Nature! In this work, we meta-learned RL algorithms at scale.
Paper: https://t.co/3V4TmPTWm4
Blog: https://t.co/G65ReK2iMs
See thread 👇
There’s huge potential in using ‘demonstrations’ from other agents with different goals: to understand which features & dynamics of the environment *might* be important to you; and to borrow from others' behaviours only where they are useful for you.
👽 PsiPhi-learning 👽 (long talk #ICML) https://t.co/TA7gDtEHak shows how an agent can use data from the behavior of other agents with diverse goals: to infer their intentions and fulfill its own! 🧵
@risi1979 Combining Deep Reinforcement Learning and Search for Imperfect-Information Games https://t.co/RB2ptFmwab from @polynoamial@anton_bakhtin et al. kinda has it all -- clarity, insights, theory, great empirical results, code available 👏
@NandoDF Yes, acquiring ILR (settled status is I think a similar scheme for EU citizens) takes many years (can be 10 years in some cases) and is very expensive. I went through the absurd process (German citizen lived here 15 years) and it would be be very hard for less privileged folks.
Permanent damage to generalisation from early updates in non-stationary training -- really enjoyed looking into this intriguing problem and trying to solve it for deep RL agents!
Really excited about our new work: In deep RL, we typically collect new data using a non-stationary policy that gets updated as we learn and improve. We show this can impact the learning dynamics of our deep policy and lead to worse generalization https://t.co/1YTfpzDZOd (1/7)
I am proud to announce the release of the NetHack Learning Environment (NLE)! NetHack is an extremely difficult procedurally-generated grid-world dungeon-crawl game that strikes a great balance between complexity and speed for single-agent reinforcement learning research. 1/
I am proud to announce the release of the NetHack Learning Environment (NLE)! NetHack is an extremely difficult procedurally-generated grid-world dungeon-crawl game that strikes a great balance between complexity and speed for single-agent reinforcement learning research. 1/
Happy to share the extended version of our #QMIX paper “Monotonic Value Function Factorisation for Deep Multi-Agent RL”
We include further analysis and ablation studies that investigate how monotonic factorisation of joint Q-val helps QMIX outperform VDN
https://t.co/AGGADZgumu
A much-improved 🎲Loaded DiCE🎲 objective lets you easily compute low-variance estimators of any-order derivatives for RL. Paper https://t.co/dllhrHuzwD and code https://t.co/NqZsdZy3iT online, nice working with @shimon8282 and @j_foerst! #NeurIPS2019
Progressively growing the action space creates a great curriculum for learning agents -- check out our paper: https://t.co/YoKe9ZIjhk + code: https://t.co/BdZjplNNEg. Great working with Laura Gustafson @ebetica@shimon8282 Nicolas Usunier @syhw
How can RL agents exploit the compositional, relational and hierarchical structure of the world? A growing number of authors propose learning from natural language. We are excited to share our @IJCAIconf survey of this emerging field! https://t.co/XLHnXMQbVY
TL;DR:🤖+📖=📈🎯🏆🥳
I had the pleasure to co-supervise outstanding MSc students jointly with Jakob Foerster (@j_foerst) and Greg Farquhar (@greg_far) at @CompSciOxford this year. Together, we compiled our advice for embarking on short-term machine learning research projects: https://t.co/hnqzC9zBWn