Haoran Xu✈️ICLR26 @ryanxhr - Twitter Profile

Pinned Tweet

about 2 months ago

Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. https://t.co/ydvILyec8p 🧵[1/7]

3

176

23

125

13K

Haoran Xu✈️ICLR26

@ryanxhr

about 2 months ago

@LucaAmb Yes, actually VGF could be thought of doing additioned flow matching with known velocity and steps.

0

64

Haoran Xu✈️ICLR26

@ryanxhr

about 2 months ago

Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. https://t.co/ydvILyec8p 🧵[1/7]

3

176

23

125

13K

Haoran Xu✈️ICLR26

@ryanxhr

about 2 months ago

@linghui35877581 In our paper we only tried the offline setting, i.e., the RLHF setup. Generally offline RL could also still be used with the development of advanced off-policy LLM RL algorithms.

0

58

Who to follow

Yufei Wang

@YufeiWang25

PhD in Robotics. Robot Learning. Robotics Institute, CMU.

Chenhao Li

@breadli428

Robotics @GoogleDeepMind | Embodied intelligence and robot learning | Doctoral fellow @ETH_AI_Center, @leggedrobotics | Prev. @MIT, @ETH_en, @MPI_IS.

Jason Ma

@JasonMa2020

Co-founder @DynaRobotics Prev: @GoogleDeepMind, @NVIDIAAI, @MetaAI, @Penn, @Harvard.

ryanxhr retweeted

Amy Zhang @yayitsamyzhang

about 2 months ago

@ryanxhr has developed this very nice work framing offline RL as an optimal transport problem, with SOTA results on offline RL benchmarks and LLM RL tasks. Check it out, and chat with him at ICLR!

0

39

3

11

3K

ryanxhr retweeted

Haoran Xu✈️ICLR26

@ryanxhr

about 2 months ago

Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. https://t.co/ydvILyec8p 🧵[1/7]

3

176

23

125

13K

Haoran Xu✈️ICLR26

@ryanxhr

about 2 months ago

This is joint work w/ @KaiwenHu856, Somayeh Sojoudi, @yayitsamyzhang. Paper: https://t.co/kSwwR6gNKz. Code: https://t.co/E10cvkva0Z. A nice walkthrough: https://t.co/OaVWpT2N9s. I will present VGF at @iclr_conf and can't wait to see you all at 🇧🇷. 🧵[7/7]

0

2

1

0

320

Haoran Xu✈️ICLR26

@ryanxhr

about 2 months ago

For online RL finetuning, VGF solves several hard tasks that previous methods could not. 🧵[6/7]

1

2

1

0

308

Haoran Xu✈️ICLR26

@ryanxhr

6 months ago

2️⃣ Information-Theoretic Reward Decomposition for Generalizable RLHF 🗓️ Thu, Dec 4, 4:30 PM – 7:30 PM, Exhibit Hall C, D, E #5413

0

1

0

130

Haoran Xu✈️ICLR26

@ryanxhr

6 months ago

I will be at #NeurIPS2025 from 12/3 to 12/7 to present two papers. Come to chat everything about RL! 1️⃣ Unifying Online and Offline RL via Implicit Value Regularization 🗓️ Thu, Dec 4, 11:00 AM – 2:00 PM, Exhibit Hall C, D, E #303

ryanxhr's tweet photo. I will be at #NeurIPS2025 from 12/3 to 12/7 to present two papers. Come to chat everything about RL!

1️⃣ Unifying Online and Offline RL via Implicit Value Regularization
🗓️ Thu, Dec 4, 11:00 AM – 2:00 PM, Exhibit Hall C, D, E #303 https://t.co/TR2Gqn0ccm

1

4

0

251

Haoran Xu✈️ICLR26

@ryanxhr

7 months ago

Grateful and honored to receive the Amazon AI Fellowship to support my research!

Rohit Prasad @RohitPrasadAI

8 months ago

Excited to announce @amazon's new AI PhD Fellowship Program supporting 100+ students across 9 universities like Carnegie Mellon, MIT & Stanford. Fellows will be paired with senior scientists working in related fields, plus receive financial support and AWS credits for research. Learn more: https://t.co/KNTcYI83Gm

RohitPrasadAI's tweet photo. Excited to announce @amazon's new AI PhD Fellowship Program supporting 100+ students across 9 universities like Carnegie Mellon, MIT & Stanford. Fellows will be paired with senior scientists working in related fields, plus receive financial support and AWS credits for research. Learn more: https://t.co/KNTcYI83Gm

9

546

100

356

119K

0

6

0

295

ryanxhr retweeted

RL Beyond Rewards Workshop @RLBRew_RLC

about 1 year ago

⚠️ Reminder! Submissions for @RL_Conference's RL beyond Reward Workshop are due May 30 (AoE)! We are brewing an interesting program and seeking innovative research work in reward-free RL. All papers are welcome, from exploratory abstracts to complete research papers.

RLBRew_RLC's tweet photo. ⚠️ Reminder! Submissions for @RL_Conference's RL beyond Reward Workshop are due May 30 (AoE)!

We are brewing an interesting program and seeking innovative research work in reward-free RL. All papers are welcome, from exploratory abstracts to complete research papers. https://t.co/ZADbAtx2tv

1

51

12

28

16K

ryanxhr retweeted

Haoran Xu✈️ICLR26

@ryanxhr

about 1 year ago

I will miss #ICLR2025 but come to check our work on a new perspective of solving Reinforcement Learning using discriminator-weighted imitation learning. @ShuozheL and @yayitsamyzhang will present it during today’s poster session.

ryanxhr's tweet photo. I will miss #ICLR2025 but come to check our work on a new perspective of solving Reinforcement Learning using discriminator-weighted imitation learning.
@ShuozheL and @yayitsamyzhang will present it during today’s poster session.

1

14

4

3

3K

Haoran Xu✈️ICLR26

@ryanxhr

about 1 year ago

Work with @harshit_sikchi @scottniekum

0

209

Haoran Xu✈️ICLR26

@ryanxhr

about 1 year ago

I will miss #ICLR2025 but come to check our work on a new perspective of solving Reinforcement Learning using discriminator-weighted imitation learning. @ShuozheL and @yayitsamyzhang will present it during today’s poster session.

1

14

4

3

3K

Haoran Xu✈️ICLR26

@ryanxhr

about 1 year ago

Come to check our work on a new pre-training objective for LLM based on the transformer architecture, led by Edward! https://t.co/7ZfUPOwh1l More work around BST will come out, stay tuned!

Edward Hu @edward_s_hu

about 1 year ago

introducing the belief state transformer: a new LLM training objective that learns (provably) rich representations for planning bst objective is satisfyingly simple: just predict a "previous" token alongside the next token come by our ICLR poster this thursday to chat!

1

78

9

54

8K

0

16

1

0

1K

ryanxhr retweeted

John Langford @JohnCLangford

about 1 year ago

The Belief State Transformer https://t.co/1xuRIU0PYT is at ICLR this week. The BST objective efficiently creates compact belief states: summaries of the past sufficient for all future predictions. See the short talk: https://t.co/nkYp7KxMZc and @mgostIH for further discussion.

5

106

18

44

16K

Haoran Xu✈️ICLR26

@ryanxhr

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users