Xuanfei Ren @XuanfeiRen - Twitter Profile

Pinned Tweet

2 days ago

Excited to share POLCA is accepted at #ICML2026! 🎉 Full walkthrough in our new blog. https://t.co/IVUBvGoU17 Would love to chat — see you in Seoul! 🇰🇷

Xuanfei Ren @XuanfeiRen

4 months ago

🚀 How can we make LLM-based optimization stable and scalable when the feedback signal is stochastic? Introducing POLCA: a framework for robust, scalable stochastic generative optimization. Paper: https://t.co/xgdjISRxtE Code: https://t.co/9TRuyvxVcf 🧵👇 1/

XuanfeiRen's tweet photo. 🚀 How can we make LLM-based optimization stable and scalable when the feedback signal is stochastic?

Introducing POLCA: a framework for robust, scalable stochastic generative optimization.

Paper: https://t.co/xgdjISRxtE

Code: https://t.co/9TRuyvxVcf
🧵👇 1/ https://t.co/2Yxq8NrVq8

4

28

9

17

17K

0

6

0

1

626

Xuanfei Ren @XuanfeiRen

5 days ago

@atrost3122 see you there🤩

0

1

0

30

XuanfeiRen retweeted

AI Native Foundation

@AINativeF

17 days ago

11. When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning? 🔑 Keywords: Offline reinforcement learning, Outcome supervision, Pessimistic actor-critic, OPAC, Sample efficiency 💡 Category: Reinforcement Learning 🌟 Research Objective: - To develop a statistical theory for policy optimization from trajectory-level outcome supervision in offline reinforcement learning, addressing challenges using a pessimistic actor-critic approach. 🛠️ Research Methods: - Introduced the OPAC algorithm which utilizes a latent reward model for optimizing policy via trajectory-level labels, and extended the method to preference-based feedback to uphold statistical guarantees. 💬 Research Conclusions: - Identified circumstances where outcome-level supervision is sample-efficient for offline control and formed conditions under which generalized outcome-based offline RL remains tractable, highlighting fundamental statistical barriers with missing process-level rewards. 👉 Paper link: https://t.co/QQlddw7uym

AINativeF's tweet photo. 11. When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

🔑 Keywords: Offline reinforcement learning, Outcome supervision, Pessimistic actor-critic, OPAC, Sample efficiency

💡 Category: Reinforcement Learning

🌟 Research Objective:
- To develop a statistical theory for policy optimization from trajectory-level outcome supervision in offline reinforcement learning, addressing challenges using a pessimistic actor-critic approach.

🛠️ Research Methods:
- Introduced the OPAC algorithm which utilizes a latent reward model for optimizing policy via trajectory-level labels, and extended the method to preference-based feedback to uphold statistical guarantees.

💬 Research Conclusions:
- Identified circumstances where outcome-level supervision is sample-efficient for offline control and formed conditions under which generalized outcome-based offline RL remains tractable, highlighting fundamental statistical barriers with missing process-level rewards.

👉 Paper link: https://t.co/QQlddw7uym

1

0

87

XuanfeiRen retweeted

Ching-An Cheng @chinganc_rl

3 months ago

Looking for Google research student researcher (PhD student) to work on LLM and agent related learning. Preferred background: RL/game theory, agentic system, LLM training. Candidate will work closely with me and @allenainie Email me if you are interested. 😀

12

279

27

182

52K

XuanfeiRen retweeted

Allen Nie ✈️ ICML 2026 🇰🇷

@allenainie

3 months ago

Hiring a student researcher for RL agents, co-hosted by @chinganc_rl and me at Google Research and DeepMind. Our work in the last 2 years: https://t.co/xSuqW40wai https://t.co/aPEK8jQczC https://t.co/LxkvCzNlEW Any interest? DMs are open or email us!

allenainie's tweet photo. Hiring a student researcher for RL agents, co-hosted by @chinganc_rl and me at Google Research and DeepMind.

Our work in the last 2 years:
https://t.co/xSuqW40wai
https://t.co/aPEK8jQczC
https://t.co/LxkvCzNlEW

Any interest? DMs are open or email us! https://t.co/V6NTppA02g

4

228

22

186

27K

Xuanfei Ren @XuanfeiRen

3 months ago

@allenainie @chinganc_rl interested 🤣

1

0

338

Xuanfei Ren @XuanfeiRen

4 months ago

@jubayer_hamid @allenainie We discuss more sophisticated priority selection strategies in Appendix C!

1

0

48

Xuanfei Ren @XuanfeiRen

4 months ago

@jubayer_hamid @allenainie Since the empirical mean already yielded strong results, we focused on our core contributions rather than hyper-parameter tuning for UCB. However, leveraging different selection diversities to instantiate diverse search algorithms remains an interesting future direction.

1

0

45

XuanfeiRen retweeted

Ching-An Cheng @chinganc_rl

4 months ago

LLM has been struggling to solve search and optimization at scale when feedback is stochastic. We propose a simple solution, POLCA, using text embedding with “provable” guarantee. Excited to see the first theoretically correct work of LLM optimization. Kudos to @XuanfeiRen

chinganc_rl's tweet photo. LLM has been struggling to solve search and optimization at scale when feedback is stochastic. We propose a simple solution, POLCA, using text embedding with “provable” guarantee. Excited to see the first theoretically correct work of LLM optimization. Kudos to @XuanfeiRen https://t.co/vA5QC4Nomx

2

51

17

34

10K

XuanfeiRen retweeted

Allen Nie ✈️ ICML 2026 🇰🇷

@allenainie

4 months ago

Well, not for nothing -- we found a way to use Gemini embeddings to improve LLM-driven search algorithms. With a simple accept/reject rule in the embedding space, you get a provable guarantee on search result.

allenainie's tweet photo. Well, not for nothing -- we found a way to use Gemini embeddings to improve LLM-driven search algorithms. With a simple accept/reject rule in the embedding space, you get a provable guarantee on search result. https://t.co/G0StaIfSPK

4

34

11

6

5K

Xuanfei Ren @XuanfeiRen

4 months ago

This is a joint work with Allen Nie @allenainie , Tengyang Xie @tengyangx and Ching-An Cheng @chinganc_rl. 11/11 #AI #GenerativeAI #MachineLearning #LLM #Optimization

0

1

0

157

Xuanfei Ren @XuanfeiRen

4 months ago

🚀 How can we make LLM-based optimization stable and scalable when the feedback signal is stochastic? Introducing POLCA: a framework for robust, scalable stochastic generative optimization. Paper: https://t.co/xgdjISRxtE Code: https://t.co/9TRuyvxVcf 🧵👇 1/

4

28

9

17

17K

Xuanfei Ren @XuanfeiRen

4 months ago

🚀 We believe POLCA is a step toward making LLM-driven automated search more reliable, scalable, and principled. As LLMs are increasingly used to optimize prompts, agents, and code, stability under noise becomes essential—not optional. 10/

1

3

1

0

161

Xuanfei Ren

@XuanfeiRen

Last Seen Users on Sotwe

Trends for you

Most Popular Users