Ti-Fen Pan @tifenpan - Twitter Profile

tifenpan retweeted

8 months ago

Discovering state-of-the-art reinforcement learning algorithms Reinforcement learning agents usually learn with rules we program by hand (TD, Q-learning, PPO…). But humans didn’t hand-design our learning rules—evolution did. What if we let machines discover their own RL update rules from experience? Junhyuk Oh and coauthors present exactly that. They train a population of agents across many environments and use meta-learning to optimize a meta-network that outputs the targets an agent should learn toward—effectively learning the agent’s loss and bootstrapping scheme end-to-end. The agent still emits a policy and predictions, but the semantics of those predictions are discovered rather than hard-coded. The outcome is striking: a discovered rule (“DiscoRL”) that sets a new bar on long-standing benchmarks. On Atari, a version trained on the 57 games (Disco57) exceeds the performance of hand-engineered algorithms while being more wall-clock efficient. Even more interesting, the same rule generalizes: without being tuned for them, it delivers state-of-the-art results on ProcGen and competitive performance on DMLab, NetHack, Crafter, and Sokoban. Scaling the discovery process to a more diverse set of environments (Disco103) makes the rule stronger still—performance improves simply by exposing it to more varied worlds. Under the hood, the learned predictions behave differently from classic value functions: they spike before salient events (big rewards, abrupt policy shifts) and are explicitly used to bootstrap and update the policy—showing the system has invented useful intermediate quantities rather than rediscovering old ones. The discovery process is also practical: a few hundred million steps per environment were enough to find a top rule, and the learned rule transfers to larger networks at evaluation time. This points to a compelling future: instead of manually crafting ever more intricate RL losses and targets, we can train agents whose learning algorithms are themselves learned—improving as we add compute, data diversity, and richer environments. Fewer knobs, more capability. Paper: https://t.co/1BT1rjc0sg

bravo_abad's tweet photo. Discovering state-of-the-art reinforcement learning algorithms

Reinforcement learning agents usually learn with rules we program by hand (TD, Q-learning, PPO…). But humans didn’t hand-design our learning rules—evolution did. What if we let machines discover their own RL update rules from experience?

Junhyuk Oh and coauthors present exactly that. They train a population of agents across many environments and use meta-learning to optimize a meta-network that outputs the targets an agent should learn toward—effectively learning the agent’s loss and bootstrapping scheme end-to-end. The agent still emits a policy and predictions, but the semantics of those predictions are discovered rather than hard-coded.

The outcome is striking: a discovered rule (“DiscoRL”) that sets a new bar on long-standing benchmarks. On Atari, a version trained on the 57 games (Disco57) exceeds the performance of hand-engineered algorithms while being more wall-clock efficient. Even more interesting, the same rule generalizes: without being tuned for them, it delivers state-of-the-art results on ProcGen and competitive performance on DMLab, NetHack, Crafter, and Sokoban. Scaling the discovery process to a more diverse set of environments (Disco103) makes the rule stronger still—performance improves simply by exposing it to more varied worlds.

Under the hood, the learned predictions behave differently from classic value functions: they spike before salient events (big rewards, abrupt policy shifts) and are explicitly used to bootstrap and update the policy—showing the system has invented useful intermediate quantities rather than rediscovering old ones. The discovery process is also practical: a few hundred million steps per environment were enough to find a top rule, and the learned rule transfers to larger networks at evaluation time.

This points to a compelling future: instead of manually crafting ever more intricate RL losses and targets, we can train agents whose learning algorithms are themselves learned—improving as we add compute, data diversity, and richer environments. Fewer knobs, more capability.

Paper: https://t.co/1BT1rjc0sg

10

664

108

567

64K

tifenpan retweeted

Ravid Shwartz Ziv

@ziv_ravid

almost 2 years ago

1/8 Looks like my paper "Tabular Data: Deep Learning is Not All You Need" just hit 1,000+ citations 🥳🥳🥳 Here's the story of how we almost didn't publish it... https://t.co/KiZ9dUTYWn

14

1K

178

996

240K

tifenpan retweeted

Artem Vysotsky

@avysotsky

about 2 years ago

🧠 I think @AnthropicAI Claude 3 Opus is better AI than GPT-4o. But I hate that https://t.co/Jg3vCtMbsR has very limited functionality. ChatLabs brings web search, youtube summary, ai assistants, ai image generation, split screen mode and more to Claude Opus and many-many more premium AIs. Check it out at https://t.co/hDLIFZQ8IH

0

38

10

30

10K

tifenpan retweeted

Milena Rmus @milenamr7

about 2 years ago

The last paper of my time in @ccnlab is now out, with @Anne_On_Tw, @xia_jimmy and Ti-Fen Pan! https://t.co/OSaBpjw28l

2

26

11

4

5K

Who to follow

Kirsten Baker

@MsKirstenBaker

2nd Grade Teacher @ Ivy Hill

Julián Alberto Montes - Montes Inc.

@untiporaro

Humano en formación

tifenpan retweeted

Teresa Lee @snickclunk

about 2 years ago

If you're thinking about going on the faculty job market, take a look at this advice doc I just shared with https://t.co/g8QLJ3PqZl: I collected my thoughts about the whole dang thing, from finding positions to negotiating a contract, & I hope y'all find it helpful!

9

216

76

307

96K

tifenpan retweeted

Arjun Raj

@arjunrajlab

about 2 years ago

@Jess_Osterhout These materials I have assembled are in part an attempt to demystify some of the hidden curriculum of academia: https://t.co/dYtFtBCOpi

4

183

31

251

18K

tifenpan retweeted

Jim Fan

@DrJimFan

over 2 years ago

We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. MM1 is a treasure trove of analysis. They discuss lots of architecture designs and even disclose that they train on GPT-4V-generated data. They provide exact scaling law coefficients (to 4 significant figures), MoE settings, and even optimal learning rate functions. I have not seen this level of details from a big tech's whitepaper for a very, very long time. Apple's so back!

DrJimFan's tweet photo. We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple.

MM1 is a treasure trove of analysis. They discuss lots of architecture designs and even disclose that they train on GPT-4V-generated data. They provide exact scaling law coefficients (to 4 significant figures), MoE settings, and even optimal learning rate functions.

I have not seen this level of details from a big tech's whitepaper for a very, very long time. Apple's so back!

58

4K

710

2K

559K

tifenpan retweeted

Jing-Jing Li @drjingjing2026

over 2 years ago

New paper with @Anne_On_Tw introducing a modeling framework for noise fluctuations in decision-making. It can be applied to identify lapses of attention, limit the impact of noisy trials on model fit, and avoid excluding some “noisy” subjects. (1/4) https://t.co/wJPHqNHwGE

4

69

29

15

11K

tifenpan retweeted

Lisa Schmors @lisa_schmors

over 2 years ago

🧠🌞 Updated for 2024: Check out this extensive list of summer schools & short courses in computational neuroscience! 📚💻 🔗 https://t.co/axBm5MKpWU #Neuroscience #SummerSchool #PhD

4

286

110

239

52K

tifenpan retweeted

Geoffrey Hinton

@geoffreyhinton

over 2 years ago

New paper: https://t.co/NJyQRsLZ71 Companies are planning to train models with 100x more computation than today’s state of the art, within 18 months. No one knows how powerful they will be. And there’s essentially no regulation on what they’ll be able to do with these models.

150

3K

681

1K

1M

tifenpan retweeted

Roman Feiman @RomanFeiman

over 3 years ago

Every year I read a lot of grad school applications from accomplished people that don't give me the info I'm looking for. It feels like a major hidden curriculum thing. So here's (my opinion on) how to write a great Statement of Purpose/Research for a PhD program. 🧵 1/

105

13K

3K

12K

0

tifenpan retweeted

Yann LeCun

@ylecun

about 3 years ago

Full video of the Munk Debate that took place on 2023-06-22: "Be it resolved, AI research and development poses an existential threat." On the YES side: Yoshua Bengio & @tegmark On the NO side: @MelMitchell1 & me. https://t.co/YK2KomGEK3

28

332

88

182

142K

tifenpan retweeted

Allie K. Miller

@alliekmiller

about 3 years ago

Google just dropped a 100% free learning path on Generative AI with 9 Courses 👇 Intro to Gen AI Intro to LLMs Intro to Responsible AI Intro to Image Generation Encoder-Decoder Attention Mechanism Transformers and BERT Image Captioning Gen AI Studio https://t.co/cyojw6P4Tm

69

7K

2K

9K

1M

tifenpan retweeted

Earl K. Miller @MillerLabMIT

about 3 years ago · Somerville

Hierarchical categorization learning is associated with representational changes in the dorsal striatum and posterior frontal and parietal cortex https://t.co/nZrTBPKoeH

1

35

11

12

6K

tifenpan retweeted

Rowan Cheung

@rowancheung

about 3 years ago

I'm surprised how many people aren't using AI Chrome extensions yet. Stop limiting yourself to just ChatGPT. Here are 5 new AI chrome extensions that will turn you into a productivity machine:

603

29K

7K

43K

7M

tifenpan retweeted

Ev (like in 'evidence', not Eve) Fedorenko 🇺🇦 @ev_fedorenko

about 3 years ago

Animal behavior+cognition researchers: what are your favorite recent (or classic!) reviews of complex cognitive abilities (esp. problem solving, Theory of Mind, etc.) in NHPs🐒, corvids / other bird species🦜🦉, elephants 🐘, and cephalopods 🐙? Thank you, thank you!

14

117

28

69

29K

tifenpan retweeted

Stephan Bickel @Steph_Bickel

over 3 years ago

We are looking for a research assistant to join our human intracranial EEG research team in NY. Starting asap or in a couple months. Official link will follow soon. In the meantime please don't hesitate to DM me with questions. Thank you! (RTs much appreciated!)

0

24

26

2

8K

tifenpan retweeted

Thomas Christophel 🌈 @tbchristophel

over 3 years ago

We are plannig to fill a fully funded PhD student position (3-year, TVÖD E13 65%) this year to work on fMRI, working memory and its intersection with higher-level cognition. Early inquiries asap via: https://t.co/G7Yt2aJbRE Please RT

0

27

19

4

7K

Ti-Fen Pan @tifenpan

about 5 years ago

New episode! We have amazing @andreasamadi to join us and share her insight in SEL based on Brain Science. https://t.co/DhmWToH0mC

0

4

1

0

Ti-Fen Pan @tifenpan

about 5 years ago

In this episode we talk about amplifying your students voices and we are really happy to have @kngiordano joining us!! https://t.co/qTLUZoh4hO

0

2

0

Ti-Fen Pan

@tifenpan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users