Bryan Chan

@chanpyb

PhD student @rlai_lab. Prev: @GoogleDeepMind, @OcadoTechnology, @kindredai, @UofTCompSci

Edmonton

Joined October 2020

535 Following

191 Followers

51 Posts

Bryan Chan @chanpyb

about 14 hours ago

Why do we assume an RL agent can always compute the correct action immediately? Every policy is a program and thus resource bounded. In this blog, I argue why computation should be part of decision making, illustrated with some toy examples. 🔗https://t.co/tBBRc6K0CX

0

2

1

1

104

Bryan Chan @chanpyb

about 1 year ago

@danielwurgaft This loss/complexity tradeoff has started bothering me---I think we can be okay with (slightly) worse loss at the cost of better generalization, e.g. physics models will fail to predict noise but memorization can. Any thoughts about this, e.g. regularization, architecture, etc.?

1

0

0

0

45

Bryan Chan @chanpyb

about 1 year ago

@puneeshdeora @bhavya_vasudeva Great work! Our work https://t.co/iH72TWaifl shows that asymptotically ICL will choose the best prediction mode, but we didn't address anything about which is preferred when they're equally good---do you think it has to do with k=1 converging faster than k=3 in the MC case?

1

0

0

0

91

Bryan Chan @chanpyb

about 1 year ago

@Stone_Tao You can also look at the streaming setting: https://t.co/3SzNV5lYlL and https://t.co/SxZ3cncTtx?

0

0

0

0

61

Who to follow

Researcher @Alibaba_Qwen, PhD @rlai_lab. Ideas are my own.

PhD candidate in Robotics & AI @UofTRobotics @VectorInst

Verified account

Postdoc @MSFTResearch NYC. Previously @rlai_lab, @berkeley_ai, @AIatMeta, @iitmadras. Opinions, if you find any, are my dog’s.

Bryan Chan @chanpyb

about 1 year ago

@Stone_Tao There are few but this is what I have on my mind: https://t.co/Yb34TOdGJ7 I think with smaller buffer size the data gets closer to on policy data, and larger -> more off policy

1

2

0

0

339

Bryan Chan @chanpyb

about 1 year ago

I will be presenting this paper on how models trade-off in-context and in-weight learning at #ICLR2025 Drop by on Saturday and I’ll be happy to chat!

chanpyb's tweet photo. I will be presenting this paper on how models trade-off in-context and in-weight learning at #ICLR2025

Drop by on Saturday and I’ll be happy to chat! https://t.co/puVOtF5O5F

Bryan Chan @chanpyb

over 1 year ago

Excited to share that our work on understanding when ICL emerges has been accepted to #ICLR2025 ! Submission for preview: https://t.co/y1F5vCTsY7

0

4

0

0

2K

0

16

1

4

1K

chanpyb retweeted

Association for Computing Machinery @TheOfficialACM

over 1 year ago

Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! https://t.co/GrDfgzW1fL

32

2K

456

136

456K

Bryan Chan @chanpyb

over 1 year ago

Excited to share that our work on understanding when ICL emerges has been accepted to #ICLR2025 ! Submission for preview: https://t.co/y1F5vCTsY7

Bryan Chan @chanpyb

over 1 year ago

LLMs can leverage context information, i.e., in-context learning (ICL) or memorize solutions, i.e., in-weight learning (IWL) for prediction, but when do they happen? 1/N

1

1

0

2

365

0

4

0

0

2K

Bryan Chan @chanpyb

over 1 year ago

Thanks @m_wulfmeier ! We were surprised to see that SAC-X is just very robust. Something that was interesting to us that we didn’t further investigate: Learning from examples ended up being more efficient than using reward. Let’s chat at #NeurIPS2024 if there’s a chance?

Markus Wulfmeier @m_wulfmeier

over 1 year ago

Here's a fascinating paper by @domo_mr_roboto's group linking hierarchical reinforcement learning and cheaply-obtainable auxiliary tasks https://t.co/dhVVA9gLDv Better exploration with minimal engineering effort remains a critical challenge (even for RLHF/AIF) - reminiscent of our efforts on SAC-X and intrinsic rewards through representation learning (VAE, Transporter, etc.) https://t.co/zt9OjxcmTz Excited to see more progress in this space! #robotics #reinforcementlearning

m_wulfmeier's tweet photo. Here's a fascinating paper by @domo_mr_roboto's group linking hierarchical reinforcement learning and cheaply-obtainable auxiliary tasks https://t.co/dhVVA9gLDv

Better exploration with minimal engineering effort remains a critical challenge (even for RLHF/AIF) - reminiscent of our efforts on SAC-X and intrinsic rewards through representation learning (VAE, Transporter, etc.) https://t.co/zt9OjxcmTz

Excited to see more progress in this space!

#robotics #reinforcementlearning

0

13

2

9

1K

0

5

0

2

366

Bryan Chan @chanpyb

over 1 year ago

@anianruoss One immediate observation I have is that there seems to be no boundary between two demonstration sequence (list. 1). Would it not be problematic because the model can’t tell they are different demonstrations without further training?

1

0

0

0

90

Bryan Chan @chanpyb

over 1 year ago

@daibond_alpha @iclr_conf 3. Both scores 3 and 5 are somewhat due to experimental results like significance and benchmarks. (3) is interesting because former provides no empirical insight, while latter provides some, arguably claiming "maybe" the theory applies. Which one is more important/contribution?

0

2

0

0

505

Bryan Chan @chanpyb

over 1 year ago

@daibond_alpha @iclr_conf Some interesting observations here: 1. It seems like the latter has "shorter reviews" and imo generally of lower quality than those of the former 2. The expectations seem to be different, maybe due to different primary areas? 3. ...

2

2

0

0

2K

Bryan Chan @chanpyb

over 1 year ago

@m_wulfmeier @NeurIPSConf I’ll be at NeurIPS too! Let’s chat about robot learning!

0

0

0

0

48

chanpyb retweeted

Mohamed Elsayed @mhmd_elsaye

over 1 year ago

Would you believe that deep RL can work without replay buffers, target networks, or batch updates? Our recent work gets deep RL agents to learn from a continuous stream of data one sample at a time without storing any sample. Joint work with @Gautham529 and @rupammahmood.

9

623

105

381

163K

chanpyb retweeted

Gautham Vasan @Gautham529

over 1 year ago

Our NeurIPS paper is now on arXiv: We introduce Action Value Gradient (AVG), a novel incremental deep RL method that learns in real-time, one sample at a time — no batch updates, target networks or a replay buffer! Co-authors @mhmd_elsaye @bellingerc @white_martha @rupammahmood

2

93

21

49

10K

Bryan Chan @chanpyb

over 1 year ago

@c_voelcker @usmananwar391 What alternative are you using? I think I can see some limitations with the IQM approach but unsure what you think to address it

1

0

0

0

47

Bryan Chan @chanpyb

over 1 year ago

@c_voelcker @apsarathchandar What gets me is that they claim that AI model evals don’t include any uncertainty/statistics…

1

2

0

0

384

chanpyb retweeted

REAL - Robotics and Embodied AI Lab @MontrealRobots

over 1 year ago

Hey all! We are thrilled to have @chanpyb from @UAlberta for this week's seminar! The talk is titled: "Why can't we use reinforcement learning for image-based robotic manipulation?". See you at 11:30AM ET! https://t.co/vki05SSZgx #rl #manipulation, #imitationLearning

MontrealRobots's tweet photo. Hey all! We are thrilled to have @chanpyb from @UAlberta for this week's seminar! The talk is titled: "Why can't we use reinforcement learning for
image-based robotic manipulation?". See you at 11:30AM ET!

https://t.co/vki05SSZgx

#rl #manipulation, #imitationLearning https://t.co/egV9rr3QcM

0

35

3

18

2K

Bryan Chan @chanpyb

over 1 year ago

@c_voelcker Haha yes, mine is also a rant about the same thing.

1

0

0

0

31

Last Seen Users on Sotwe

Trends for you

Most Popular Users