Discovering state-of-the-art reinforcement learning algorithms
Reinforcement learning agents usually learn with rules we program by hand (TD, Q-learning, PPO…). But humans didn’t hand-design our learning rules—evolution did. What if we let machines discover their own RL update rules from experience?
Junhyuk Oh and coauthors present exactly that. They train a population of agents across many environments and use meta-learning to optimize a meta-network that outputs the targets an agent should learn toward—effectively learning the agent’s loss and bootstrapping scheme end-to-end. The agent still emits a policy and predictions, but the semantics of those predictions are discovered rather than hard-coded.
The outcome is striking: a discovered rule (“DiscoRL”) that sets a new bar on long-standing benchmarks. On Atari, a version trained on the 57 games (Disco57) exceeds the performance of hand-engineered algorithms while being more wall-clock efficient. Even more interesting, the same rule generalizes: without being tuned for them, it delivers state-of-the-art results on ProcGen and competitive performance on DMLab, NetHack, Crafter, and Sokoban. Scaling the discovery process to a more diverse set of environments (Disco103) makes the rule stronger still—performance improves simply by exposing it to more varied worlds.
Under the hood, the learned predictions behave differently from classic value functions: they spike before salient events (big rewards, abrupt policy shifts) and are explicitly used to bootstrap and update the policy—showing the system has invented useful intermediate quantities rather than rediscovering old ones. The discovery process is also practical: a few hundred million steps per environment were enough to find a top rule, and the learned rule transfers to larger networks at evaluation time.
This points to a compelling future: instead of manually crafting ever more intricate RL losses and targets, we can train agents whose learning algorithms are themselves learned—improving as we add compute, data diversity, and richer environments. Fewer knobs, more capability.
Paper: https://t.co/1BT1rjc0sg
1/8 Looks like my paper "Tabular Data: Deep Learning is Not All You Need" just hit 1,000+ citations 🥳🥳🥳
Here's the story of how we almost didn't publish it...
https://t.co/KiZ9dUTYWn
🧠 I think @AnthropicAI Claude 3 Opus is better AI than GPT-4o. But I hate that https://t.co/Jg3vCtMbsR has very limited functionality.
ChatLabs brings web search, youtube summary, ai assistants, ai image generation, split screen mode and more to Claude Opus and many-many more premium AIs.
Check it out at https://t.co/hDLIFZQ8IH
If you're thinking about going on the faculty job market, take a look at this advice doc I just shared with https://t.co/g8QLJ3PqZl: I collected my thoughts about the whole dang thing, from finding positions to negotiating a contract, & I hope y'all find it helpful!
@Jess_Osterhout These materials I have assembled are in part an attempt to demystify some of the hidden curriculum of academia: https://t.co/dYtFtBCOpi
We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple.
MM1 is a treasure trove of analysis. They discuss lots of architecture designs and even disclose that they train on GPT-4V-generated data. They provide exact scaling law coefficients (to 4 significant figures), MoE settings, and even optimal learning rate functions.
I have not seen this level of details from a big tech's whitepaper for a very, very long time. Apple's so back!
New paper with @Anne_On_Tw introducing a modeling framework for noise fluctuations in decision-making. It can be applied to identify lapses of attention, limit the impact of noisy trials on model fit, and avoid excluding some “noisy” subjects. (1/4)
https://t.co/wJPHqNHwGE
🧠🌞 Updated for 2024: Check out this extensive list of summer schools & short courses in computational neuroscience! 📚💻
🔗 https://t.co/axBm5MKpWU
#Neuroscience#SummerSchool#PhD
New paper:
https://t.co/NJyQRsLZ71
Companies are planning to train models with 100x more computation than today’s state of the art, within 18 months. No one knows how powerful they will be. And there’s essentially no regulation on what they’ll be able to do with these models.
Every year I read a lot of grad school applications from accomplished people that don't give me the info I'm looking for. It feels like a major hidden curriculum thing. So here's (my opinion on) how to write a great Statement of Purpose/Research for a PhD program. 🧵 1/
Full video of the Munk Debate that took place on 2023-06-22:
"Be it resolved, AI research and development poses an existential threat."
On the YES side: Yoshua Bengio & @tegmark
On the NO side: @MelMitchell1 & me.
https://t.co/YK2KomGEK3
Google just dropped a 100% free learning path on Generative AI with 9 Courses 👇
Intro to Gen AI
Intro to LLMs
Intro to Responsible AI
Intro to Image Generation
Encoder-Decoder
Attention Mechanism
Transformers and BERT
Image Captioning
Gen AI Studio
https://t.co/cyojw6P4Tm
Hierarchical categorization learning is associated with representational changes in the dorsal striatum and posterior frontal and parietal cortex
https://t.co/nZrTBPKoeH
I'm surprised how many people aren't using AI Chrome extensions yet.
Stop limiting yourself to just ChatGPT.
Here are 5 new AI chrome extensions that will turn you into a productivity machine:
Animal behavior+cognition researchers: what are your favorite recent (or classic!) reviews of complex cognitive abilities (esp. problem solving, Theory of Mind, etc.) in NHPs🐒, corvids / other bird species🦜🦉, elephants 🐘, and cephalopods 🐙? Thank you, thank you!
We are looking for a research assistant to join our human intracranial EEG research team in NY. Starting asap or in a couple months. Official link will follow soon. In the meantime please don't hesitate to DM me with questions. Thank you! (RTs much appreciated!)
We are plannig to fill a fully funded PhD student position (3-year, TVÖD E13 65%) this year to work on fMRI, working memory and its intersection with higher-level cognition. Early inquiries asap via: https://t.co/G7Yt2aJbRE Please RT