Hindsight Experience Replay has become the ubiquitous method for goal-conditioned reinforcement learning, but leaves open the question of which goal to relabel with.
In this work, accepted at ICML, we propose instead simply Learning Everything All at Once (LEO).
1/
Mean Field Games provide a framework for modelling large populations.
ICML26 Spotlight: Introducing Recurrent Structural Policy Gradient for partially observable MFGs with common noise, benefitting from faster convergence than model-free RL, but remaining tractable, unlike DP.
Hindsight Experience Replay has become the ubiquitous method for goal-conditioned reinforcement learning, but leaves open the question of which goal to relabel with.
In this work, accepted at ICML, we propose instead simply Learning Everything All at Once (LEO).
1/
While the focus of our work is on finite goal sets, we also adapt LEO for continuous goal sets through goal quantisation, achieving competitive results with Hindsight Experience Replay in continuous control tasks.
8/
@jsuarez I think the group would definitely be interested - after the NeurIPS deadline?
I can preload my question: NLE is unsolved +already in C +people would be very impressed if it was solved. Do you think puffer could solve it?
@elliotarledge This looks really cool! How does the agent trained with pufferlib perform? + can it transfer back to original Craftax? (i.e. is this a 1-1 exact environment remake?)
Excited to announce I’ll be joining @EugeneVinitsky at @nyutandon this autumn for a PhD!
I will be working on the intersection of game theory, reinforcement learning, and autonomous vehicles.
Thanks to everyone who helped me get to this point, especially from @FLAIR_Ox :)
1/ As compute continues to grow and simulators continue to improve, it is becoming feasible to train RL agents for billions or trillions of timesteps. However, this is only useful if agents can continue learning over such long training horizons, which is far from given 👇
📢Current world models aren't really modeling the world; they're modeling one agent's view of it. Partial observations ≠ world state.
Future world models will be independent of any one agent's perspective. You will be able to “drop in” any number of agents at any point in time, and a persistent world state will evolve with their interactions. Imagine a neural MMORPG server. 🧵[1/10]
Why did only humans invent graphical systems like writing? 🧠✍️
In our new paper at @cogsci_soc, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇