Xiao Yu @ ICLR2026 @xy2437 - Twitter Profile

Pinned Tweet

Xiao Yu @ ICLR2026 @xy2437

4 months ago

For AI agents to scale beyond narrow tasks, they need to learn how the world works from their own interactions — in a self-supervised way, without relying on expert data or rewards. We introduce Reinforcement World Model Learning (RWML), a self-supervised method that trains LLM-based agents to align their simulated next states with actual environment dynamics, improving robustness and adaptability across tasks. Empirically, we find RWML directly boosts solve rate on ALFWorld and τ² Bench without expert data — and scales naturally to large RL settings.

xy2437's tweet photo. For AI agents to scale beyond narrow tasks, they need to learn how the world works from their own interactions — in a self-supervised way, without relying on expert data or rewards.

We introduce Reinforcement World Model Learning (RWML), a self-supervised method that trains LLM-based agents to align their simulated next states with actual environment dynamics, improving robustness and adaptability across tasks.

Empirically, we find RWML directly boosts solve rate on ALFWorld and τ² Bench without expert data — and scales naturally to large RL settings.

1

5

0

1

111

Xiao Yu @ ICLR2026 @xy2437

4 months ago

Shout out to my collaborates at @Columbia @MSFTResearch @dartmouth for all the support! paper link: https://t.co/FTmKgs8Otz

0

1

0

32

Xiao Yu @ ICLR2026 @xy2437

4 months ago

For AI agents to scale beyond narrow tasks, they need to learn how the world works from their own interactions — in a self-supervised way, without relying on expert data or rewards. We introduce Reinforcement World Model Learning (RWML), a self-supervised method that trains LLM-based agents to align their simulated next states with actual environment dynamics, improving robustness and adaptability across tasks. Empirically, we find RWML directly boosts solve rate on ALFWorld and τ² Bench without expert data — and scales naturally to large RL settings.

1

5

0

1

111

Xiao Yu @ ICLR2026 @xy2437

4 months ago

[4/n] Takeaway: agents can learn more from their own experience than we often assume. By aligning imagined next states with what actually happens — and doing so in a representation space that highlights meaningful changes — RWML turns every interaction into a learning signal and builds richer world understanding.

1

0

31

Xiao Yu @ ICLR2026 @xy2437

4 months ago

Accepted by #ICLR2026 🎉See ya in Brazil!

Xiao Yu @ ICLR2026 @xy2437

8 months ago

Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔 We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this! 🧵[1/n]

xy2437's tweet photo. Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔

We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this!
🧵[1/n]

1

10

4

3K

0

6

0

419

Xiao Yu @ ICLR2026 @xy2437

8 months ago

Shout out to @Columbia @MSFTResearch for all the support! paper link: https://t.co/JyLxEforPw

0

1

0

118

Xiao Yu @ ICLR2026 @xy2437

8 months ago

Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔 We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this! 🧵[1/n]

1

10

4

3K

Xiao Yu @ ICLR2026 @xy2437

8 months ago

[4/n] Take-away: agents can learn not just from success, but from *every* environment interaction to enhance their reasoning/acting, if used correctly💡

1

0

122

xy2437 retweeted

Shirley Wu

@ShirleyYXWu

about 1 year ago

Can we ever truly trust foundation models—and if so, how? Our ICCV TrustFM workshop (https://t.co/lnhouGqc1L) is now accepting submissions (deadline: 8/1, attending: 10/19-10/23, Hawai'i) Submit, attend, and learn from everyone around the world who is making FMs more trustworthy Co-organized with Zeliang Zhang, @ChenliangXu, @huang_chao_cs, @MuCai7, @DrogoKhal4, @jd92wang, @hendrycks, @jure, etc. @trustworthy_ml @StanfordAILab

ShirleyYXWu's tweet photo. Can we ever truly trust foundation models—and if so, how?

Our ICCV TrustFM workshop (https://t.co/lnhouGqc1L) is now accepting submissions (deadline: 8/1, attending: 10/19-10/23, Hawai'i)

Submit, attend, and learn from everyone around the world who is making FMs more trustworthy

Co-organized with Zeliang Zhang, @ChenliangXu, @huang_chao_cs, @MuCai7, @DrogoKhal4, @jd92wang, @hendrycks, @jure, etc.

@trustworthy_ml @StanfordAILab

0

40

7

9

9K

xy2437 retweeted

Yu Feng @AnnieFeng6

about 1 year ago

🚨COLM 2025 Workshop on AI Agents: Capabilities and Safety @COLM_conf This workshop explores AI agents’ capabilities—including reasoning and planning, interaction and embodiment, and real-world applications—as well as critical safety challenges related to reliability, ethics, and human-agent interaction. Non-archival submission deadline: June 23, 2025! 🌐Website: https://t.co/Zyghrm7wDW 📜OpenReview: https://t.co/OkApJbtBxN 👥Program committee/Reviewer sign-up form: https://t.co/2onctOrhsA ✉️Mailing list: [email protected] 📅WikiCFP: https://t.co/y0sBiJFpdb Shout out to our amazing organizing team members @Zhou_Yu_AI @ysu_nlp @uiuc_aisecure @ Yi Zhang @ Baolin Peng @ Jim Zhiwei Liu @DanRothNLP @LyleUngar @sharathguntuku @yugu_nlp @xy2437 @jeffrey_ch0 @AnnieFeng6 @Haoyu_Wang_97 @ZRChen_AISafety @raphaelshu @yooli23

4

83

19

36

24K

Xiao Yu @ ICLR2026 @xy2437

over 1 year ago

Thanks @MSFTResearch for all the support! Excited to share that our work is also accepted by #ICLR2025! - Paper: https://t.co/SBWBOxoivW - Code: https://t.co/agV471v8U5 - Website: https://t.co/5iiUx0kTTz

Microsoft Research

@MSFTResearch

over 1 year ago

ExACT combines Reflective-MCTS and Exploratory Learning to improve AI agents' decision-making, enabling test-time compute scaling. Learn how these methods help agents refine strategies for state-of-the-art performance and improved computational efficiency: https://t.co/GUhzuX9NuQ

6

116

23

46

13K

0

10

3

1

2K

Xiao Yu @ ICLR2026 @xy2437

over 1 year ago

Website link was broken… it’s now fixed!

Xiao Yu @ ICLR2026 @xy2437

over 1 year ago

To effectively solve modern computer tasks, AI agents need to be able to strategically explore the environment and efficiently learn from past interactions. We present R-MCTS and Exploratory Learning for building o1-like models for agentic applications. Our GPT-4o powered R-MCTS agent creates SOTA performance on VisualWebArena. Notably, R-MCTS and Exploratory Learning (without MCTS) demonstrate the compute scaling properties in both training and testing time! 🌐: https://t.co/5UIDKzi3Ie

1

19

7

7K

0

3

0

211

Xiao Yu @ ICLR2026 @xy2437

over 1 year ago

Please refer to our paper and website for more details on method, results, and demos!

0

1

0

123

Xiao Yu @ ICLR2026 @xy2437

over 1 year ago

To effectively solve modern computer tasks, AI agents need to be able to strategically explore the environment and efficiently learn from past interactions. We present R-MCTS and Exploratory Learning for building o1-like models for agentic applications. Our GPT-4o powered R-MCTS agent creates SOTA performance on VisualWebArena. Notably, R-MCTS and Exploratory Learning (without MCTS) demonstrate the compute scaling properties in both training and testing time! 🌐: https://t.co/5UIDKzi3Ie

1

19

7

7K

Xiao Yu @ ICLR2026 @xy2437

over 1 year ago

Part 2.2) In our experiments with VisualWebArena, we find that: - Exploratory Learning boosts performance: GPT-4o fine-tuned with Exploratory Learning improves performance, even without search, similar to the effects of scaling test-time compute with MCTS. - Exploratory Learning enables test-time compute scaling: The fine-tuned GPT-4o exhibits better performance when allowed more actions per task, enhancing decision-making and task completion. - Exploratory Learning improves generalization to unseen tasks: The fine-tuned GPT-4o demonstrates improved performance on unseen tasks compared to no-training/imitation learning on best actions.

1

0

143

Xiao Yu @ ICLR2026

@xy2437

Last Seen Users on Sotwe

Trends for you

Most Popular Users