Can you solve MARL tasks without conditioning the policy on the observation? In our new paper we investigate the popular StarCraft micromanagement benchmark SMAC and discover that most scenarios do not require the observation to be used! [1/N]
Hindsight Experience Replay has become the ubiquitous method for goal-conditioned reinforcement learning, but leaves open the question of which goal to relabel with.
In this work, accepted at ICML, we propose instead simply Learning Everything All at Once (LEO).
1/
Our open models are designed to support the Genesis Mission by giving the scientists in our national labs the flexibility and sovereignty to work on their own terms. Learn more ⤵️
We are hiring up to two postdocs @FLAIR_Ox: https://t.co/RvXPolIb9N
FLAIR is an extreme outlier in talent density X agency X resources X fun. If this sounds exciting, please apply. Deadline is 31st of March AOE. Looking for a place where you can cook and ship in peace? Join us.
Very excited to share that Reflection is partnering with Shinsegae Group to build a 250-megawatt AI factory for South Korea.
I believe this is a sign of things to come and it's a privilege to be part of it.
Reflection is partnering with Shinsegae Group to build a 250-megawatt sovereign AI factory for the Republic of Korea.
Open intelligence. Built on trust between allies. Owned by the nations that need it most.
The future of sovereign AI. Read more in the @WSJ.
AGI is in its first stages of take-off.
Every country is realizing that AI sovereignty is existential, which requires open models.
We’ve signed a deal with Shinsegae Group to build South Korea’s sovereign cloud on a US open model built by Reflection.
More to come.
Reflection is partnering with Shinsegae Group to build a 250-megawatt sovereign AI factory for the Republic of Korea.
Open intelligence. Built on trust between allies. Owned by the nations that need it most.
The future of sovereign AI. Read more in the @WSJ.
My PhD thesis is out 🥳🎓
How do LLMs, trained on trillions of tokens, reason?
Can they generalise beyond their training data or are they constrained by what they've seen before?
My takeaway: they can generalise beyond training in interesting ways, showing genuine reasoning
I am looking for an intern to do a research project on RL posttraining of LLMs. If you are PhD student and would like to work with me for several months pushing the efficiency of RL systems, send me an email with the [efficient_rl_internship] subject. Friends, please, retweet.
📢 New PhD Position 📢
We (@_rockt, @borruell, and I) are looking for a PhD student to work at the intersection of open-endedness and game design. The student will be part of the @UCL_DARK lab and funded by @iconicgamesio and UCL.
See this doc for a more detailed description of the research direction and candidate expectations:
https://t.co/eYsFKlgCJt
To apply, please complete this form by January 15:
https://t.co/UOGva9iBvJ
Hello World: I am reviewing Phd applications and the level of talent is amazing. Sadly, the funding situation is extremely challenging.
SO: If you'd like to gift someone brilliant literally the opportunity of their lifetime and sponsor their Phd in my group please let me know 🙏