Prabhat Nagarajan @prabhatmn - Twitter Profile

Pinned Tweet

2 days ago

I'm excited and grateful to share our ICML spotlight paper "Accelerating Q-learning through Efficient Value-Sharing across Actions". It introduces the "mean-expansion layer", a very simple parameter-free layer that accelerates action-value learning and improves performance! (1/6)

2

19

0

7

3K

Prabhat Nagarajan @prabhatmn

about 3 hours ago

@_Suresh2 No, actually. There is an implicit assumption in this method that we are applying this to discrete-action Q-networks, where the output is a vector of action-values. One forward pass already outputs a value for every action, and we just add a layer after that.

0

1

0

6

Prabhat Nagarajan @prabhatmn

2 days ago

I'm excited and grateful to share our ICML spotlight paper "Accelerating Q-learning through Efficient Value-Sharing across Actions". It introduces the "mean-expansion layer", a very simple parameter-free layer that accelerates action-value learning and improves performance! (1/6)

2

19

0

7

3K

Prabhat Nagarajan @prabhatmn

2 days ago

@mic_nau Thanks for your interest! Unfortunately, at the moment, no. @Sisyfuzz and I are working on generalizing this to continuous actions right now. I’ll be sure to keep you posted on that.

0

1

0

16

Who to follow

Mikayel Samvelyan

@_samvelyan

@GoogleDeepMind Autonomous Discovery and Self-Improvement. Ex @Meta (FAIR) | @UCL | @UniofOxford.

Aaron Schein

@AaronSchein

Asst Prof @DSI_UChicago & Stats l Previously: postdoc @DataSciColumbia, PhD @manningcics | research on ML, applied stats, social science

Maximilian Igl

@MaxiIgl

RS at Nvidia focussing on autonomous vehicles. Former Oxford PhD, MSR, Deepmind and Waymo. Opinions my own and do not represent those of my employer, Nvidia.

Prabhat Nagarajan @prabhatmn

2 days ago

(6/6) Details - Paper: https://t.co/Gf8dA6nFnP (Appendix D if you just want the PyTorch code) - Talk 🍿: https://t.co/KdYVJt30Gu - ICML Poster at Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM in HALL A #404

1

4

0

144

Prabhat Nagarajan @prabhatmn

2 days ago

(5/6) Big thanks to my collaborators, Brett Daley, @white_martha , and @MarlosCMachado. I'm also grateful that our paper won the Best paper runner-up at the Adaptive Learning Agents workshop at AAMAS. Try out the mean-expansion layer and let us know how it goes!

1

4

0

138

Prabhat Nagarajan @prabhatmn

about 1 month ago

We propose a simple parameter-free layer, called the mean-expansion (ME) layer, that can be applied to the end of a Q-network to accelerate action-value learning in deep RL. The ME layer improves DQN and IQN's performance across 57 games without changing the algorithms or hypers!

0

3

0

211

Prabhat Nagarajan @prabhatmn

about 1 month ago

Friends at #AAMAS, tomorrow at 11 AM at the ALA workshop, I will be giving an early presentation of our #ICML2026 Spotlight, "Accelerating Q-learning through Efficient Value-sharing across Actions". Done in collaboration with Brett Daley, @white_martha, and @MarlosCMachado.

2

19

5

3

4K

prabhatmn retweeted

Marlos C. Machado @MarlosCMachado

about 2 months ago

Our paper, “The Cell Must Go On! AgarCL as an Evaluation Platform for Continual RL“, has been accepted at @RL_Conference'26! This work was led by @Mohamed15069 as a master's student at @UAlberta @AmiiThinks. Preprint: https://t.co/e6u3SNi7iY Blog post: https://t.co/yN3eFiwULw

2

47

6

13

7K

prabhatmn retweeted

Marlos C. Machado @MarlosCMachado

3 months ago

A couple of months ago, we released a preprint of one of my favourite papers I’ve ever written. It lies at the intersection of representation learning and neuroscience. I have now written a blog post about it. Preprint: https://t.co/vtDeBzvjsq Blog post: https://t.co/d5rPZHoGaC

3

183

35

142

14K

prabhatmn retweeted

John Langford @JohnCLangford

8 months ago

A key claim here https://t.co/2WDyQ5PIQc is that next token prediction has no inherent preference for a heliocentric Copernicus theory https://t.co/WS6cIEfUy1 over a geocentric Ptolemy https://t.co/XJHIpm0Pdj theory of observations. Predicting the next latent fixes that.

1

8

3

6

2K

prabhatmn retweeted

Peter Stone @PeterStone_TX

8 months ago

Proud of our latest Nature publication, led by the Sony AI ethics team!

0

19

5

0

2K

prabhatmn retweeted

Hamid Maei

@HamidMaei

9 months ago

Here is my response about RL comment after watching the podcast of @dwarkesh_sp interviewing @karpathy and his comment on RL: https://t.co/OA0nwALSCw RL is a powerful idea, but let's be thoughtful about when we actually need it. RL has an incredible 120+ year journey in animal learning—starting with Edward Thorndike's puzzle box experiments in 1898, through decades of psychology research on trial-and-error learning, culminating in 1997 discovery by Schultz, Dayan, and Montague that dopamine neurons literally implement temporal difference learning. That's real RL: agent -environments experiential interaction, and actual learning through experience. But what we're calling "RL" in LLMs today? It's mostly optimization with reward-weighted gradients on offline data. There's no agent exploring an environment, no real interaction loop. It's just: data in, neural net processes it, and objective function for the output is weighted by reward signals during final RL training stage after intermediate training steps. Here's the thing—RL fundamentally addresses problems we'd solve with dynamic programming, except the state space is astronomically large ( easily more states than atoms in the universe--a 19x19 Computer Go roughly has 10^170 states!) and we don't have a model of the world. So we use RL as an approximation learning technique. It's essential for these kinds of problems, but not every problem is like this. When you have interactive, experiential tasks—like learning to generate code to build a software that gets runtime feedback from compute environment given a goal—RL makes perfect sense for LLM-based methods. But the current RL in textbooks or research papers are not enough-- it needs temporal abstractions over sequence of actions--aka generated tokens in LLM world. It is impossible to learn optimal policies for micro-actions without temporal abstractions unless we assume we have infinite data and compute resource--remember it is an NP-hard combinatorial problem! I think as we move into the era of experiential AI—agents that actually interact with environments and receive runtime feedback—the role of RL will become clearer and more necessary.

1

42

5

29

7K

Prabhat Nagarajan @prabhatmn

9 months ago

@DimitrisPapail Not P(next input | input)? 🤔

0

3

0

325

Prabhat Nagarajan @prabhatmn

9 months ago

@Ulmo_Space @RichardSSutton I agree! I suppose it depends what we mean by knowledge-seeking. We have plenty pattern-recognition machines without the capacity for self-verification. The question is whether these pattern recognition machines have knowledge.

0

21

Prabhat Nagarajan @prabhatmn

10 months ago

Have people seen this prescient 2001 post by @RichardSSutton on self-verification? "An AI system can create and maintain knowledge only to the extent that it can verify that knowledge itself". This sentiment underpins much LLM reasoning research today. https://t.co/gK3DOwqYm8

1

29

4

11

8K

prabhatmn retweeted

Jens Tuyls @JensTuyls

9 months ago

Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!

JensTuyls's tweet photo. Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training! https://t.co/KXgig0SgO1

2

97

19

52

22K

prabhatmn retweeted

Barack Obama

@BarackObama

9 months ago

Jane Goodall had a remarkable ability to inspire us to connect with the natural wonders of our world, and her groundbreaking work on primates and the importance of conservation opened doors for generations of women in science. Michelle and I are thinking of all those who loved and admired her.

5K

233K

19K

4K

24M

Prabhat Nagarajan @prabhatmn

9 months ago

@scaling01 I disagree. I think Dwarkesh is using imitation in the ML sense. I.e., the agent is given the correct answer to directly imitate. I think Rich, who is well-versed in behavioral psychology, is saying that imitation learning in animals occurs through experience.

0

1

0

113

Prabhat Nagarajan

@prabhatmn

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users