Yixuan Wang

technical staff @thinkymachines; less technical stuff @aclmentorship; phd @umich; views are my own

about 12 hours ago

Wow, thank you Jack for the thread! I’m impressed by how quickly you put this together. In the talk, I discussed how combining vision/touch and structured world models can help robots reason about physical interactions, and address some of the key bottlenecks in robot learning.

1

23

2

4

2K

YXWangBot retweeted

Jack Langerman ✈️ CVPR

@jacklangerman

about 12 hours ago

What about more in the learned direction (action conditioned video generation) - online interactive demo on lab webpage (cool)

jacklangerman's tweet photo. What about more in the learned direction (action conditioned video generation)

- online interactive demo on lab webpage (cool) https://t.co/IcSohLlPIo

1

2

1

0

148

YXWangBot retweeted

Jack Langerman ✈️ CVPR

@jacklangerman

about 13 hours ago

In "Scaling Robot Leraning via Multimodal PHysical Reasoning" from @YunzhuLiYZ now. About to get started!

1

6

4

0

3K

Who to follow

Martin Ziqiao Ma

@ziqiao_ma

Ph.D. candidate @UMich. Intern @GoogleDeepMind. Incoming intern @AIatMeta. AI for science, LLM reasoning, and more.

YXWangBot retweeted

Kaifeng Zhang

@kaiwynd

3 days ago

Our work on real-to-sim robot policy evaluation will be presented this week at ICRA 2026! #ICRA2026 https://t.co/QVzptkRYqC Keywords: policy evaluation, real2sim, Gaussian Splatting, deformable objects Check our code on GitHub and stop by our poster on Thursday!

0

24

1

4

3K

YXWangBot retweeted

about 22 hours ago

Come chat with @shashuo0104 and me this afternoon from 15:00–16:30 at ThI2I.288! We’ll be discussing how to construct digital twins with systematic alignment to the real world, and how this could help unlock one of the key bottlenecks in developing robot policies: evaluation.

0

14

3

1

2K

YXWangBot retweeted

Neil Nie

@neil_nie_

4 days ago

Introducing STACK: Learning Composable Skills by Discovering Spatial and Temporal Structure with Foundation Models. How can robots learn skills that generalize when the world change? The default answer has been more data. It works, but it's expensive and slow. During my final year at @Stanford, our team explored a different idea: can robots discover the right abstractions from just a handful of expert demonstrations to enable strong generalization? STACK uses foundation models to discover spatial and temporal structure from a handful of demonstrations, then learns composable skills on top of that structure. With just 5 - 10 real-world demonstrations per domain, across three manipulation settings, and without hand-designed task decomposition. For more details, including real-world demos: https://t.co/Rkt3VXH2Nm (1/n)

2

68

7

45

10K

YXWangBot retweeted

Xuning Yang @xuningy

4 days ago

🎉 We added 2 SOTA WAMs to the RoboLab Leaderboard 🎉 Current leaders on RoboLab-120 (specific instr.): 🥇Cosmos3-Nano-Policy (39.7%) 🥈π0.5 (28.1%) 🥉DreamZero (28.1%) → See full results at: https://t.co/Le8jykn5jo → All policy clients available at: https://t.co/wQH4Py6zJ8

xuningy's tweet photo. 🎉 We added 2 SOTA WAMs to the RoboLab Leaderboard 🎉

Current leaders on RoboLab-120 (specific instr.):
🥇Cosmos3-Nano-Policy (39.7%)
🥈π0.5 (28.1%)
🥉DreamZero (28.1%)

→ See full results at: https://t.co/Le8jykn5jo

→ All policy clients available at: https://t.co/wQH4Py6zJ8 https://t.co/PMg9l74zBU

7

126

21

65

30K

YXWangBot retweeted

4 days ago

Excited to share a few presentations, demos, and workshop talks from our group and collaborators at #ICRA2026! We will present recent work on real-to-sim-to-real robot policy evaluation, model-based planning with learned dynamics, and multi-modal manipulation. We will also have a joint live demo between @SceniXai and @ADI_News on real-to-sim-to-real cable manipulation at the ICRA exhibition. This is a small teaser of what we have been building, with more to come soon! If you are at ICRA, please stop by the sessions or the demo booth. Happy to chat about robot learning, simulation, world models, and sim-to-real!

1

133

15

41

10K

4 days ago

Big Congrats!!!

Kaichun Mo @ CVPR

@KaichunMo

4 days ago

Super proud to be part of NVIDIA's Cosmos3 Physical-AI Omnimodal Foundation Model. Topping the image/video/sound generation benchmarks and Robot Policy Benchmarks 😀

3

57

2

7

7K

0

3

0

710

YXWangBot retweeted

Youngsun Wi @WiYoungsun

12 days ago

TactAlign was accepted to RSS 2026! Huge thanks to the reviewers for their thoughtful feedback. See you on the other side of the world 🤓🙌

3

96

17

36

13K

YXWangBot retweeted

Hongxun Wu @HongxunWu

16 days ago

🧵(1/8) An @OpenAI internal reasoning LLM achieved an AI Math milestone: solving an open problem central to its mathematical subfield— in this case, the unit distance problem of discrete geometry. We came across it in a side quest to truly push our model on the hardest problems.

HongxunWu's tweet photo. 🧵(1/8) An @OpenAI internal reasoning LLM achieved an AI Math milestone: solving an open problem central to its mathematical subfield— in this case, the unit distance problem of discrete geometry.

We came across it in a side quest to truly push our model on the hardest problems. https://t.co/fdgXp3aPVp

26

943

131

307

139K

16 days ago

We are organizing an RSS workshop on Tactile Sensing for Robotic Foundation Models!! Join us for talks from our all-star speakers and submit your papers for a chance to win your paper awards!!

James (Jingxi) Xu

@drjingxi

16 days ago

Excited to announce our #RSS2026 workshop on Tactile Sensing for Robotic Foundation Models! We are calling for papers. Ant Group will sponsor a $500 Best Paper Award, and RAI Institute will sponsor two $200 Runner-up Awards. We will also provide several travel supports ($300 each). Join us in Sydney! Website: https://t.co/5evoIiKVvT

drjingxi's tweet photo. Excited to announce our #RSS2026 workshop on Tactile Sensing for Robotic Foundation Models!

We are calling for papers. Ant Group will sponsor a $500 Best Paper Award, and RAI Institute will sponsor two $200 Runner-up Awards. We will also provide several travel supports ($300 each).

Join us in Sydney!

Website: https://t.co/5evoIiKVvT

1

38

11

7K

0

15

0

3

2K

YXWangBot retweeted

Haotian Xue

@Haotianxue_GT

21 days ago

❓ How well can ACWMs learn different types of physics e.g. rigid bodies, deformables, particles, and kinematics? ❓ Can they actually generalize beyond the training distribution? 🚀 We are excited to release ACWM-Phys: a Physics-rich investigation into Action-Conditioned video World Models! While most world-model research today focuses on ego-view game play or narrow robot-arm manipulation, we ask two questions: We collect 15K+ simulated trajectories across 8⃣ environments spanning 4⃣ physics regimes (rigid contact🧊, particle dynamics🌊, kinematics🦾, and deformable contact🧥), each with a controlled, physically meaningful InD ↔ OoD split (unseen cube counts, larger cloth, doubled particle counts, expanded workspaces, …). We train ACWM-DiT, a latent diffusion transformer with flow matching, and find a pattern: simple low-dimensional geometry generalizes cleanly, but contact-rich deformation, particle dynamics, and high-DoF kinematics break down current ACWMs still capture visual statistics, not physical laws. We also did some ablation to draw insights about model arch, data scaling and action complexity. The datasets and checkpoints for all 8 environments have been publicly released: 📃Paper: https://t.co/x7ABoXMMaU 📘Page: https://t.co/3PwvKkUfuQ 🐙Code: https://t.co/vYMw6uLK24 📠Dataset: https://t.co/xskCOjYN5w 🤗Checkpoints: https://t.co/YC2SzKHXa2 Also shout out to @YongxinChen1 , Yipu, Liqian, Zelin, @lamawm7 @YuchenZhu_ZYC

Haotianxue_GT's tweet photo. ❓ How well can ACWMs learn different types of physics e.g. rigid bodies, deformables, particles, and kinematics? ❓ Can they actually generalize beyond the training distribution?

🚀 We are excited to release ACWM-Phys: a Physics-rich investigation into Action-Conditioned video World Models!

While most world-model research today focuses on ego-view game play or narrow robot-arm manipulation, we ask two questions:

We collect 15K+ simulated trajectories across 8⃣ environments spanning 4⃣ physics regimes (rigid contact🧊, particle dynamics🌊, kinematics🦾, and deformable contact🧥), each with a controlled, physically meaningful InD ↔ OoD split (unseen cube counts, larger cloth, doubled particle counts, expanded workspaces, …).

We train ACWM-DiT, a latent diffusion transformer with flow matching, and find a pattern: simple low-dimensional geometry generalizes cleanly, but contact-rich deformation, particle dynamics, and high-DoF kinematics break down current ACWMs still capture visual statistics, not physical laws. We also did some ablation to draw insights about model arch, data scaling and action complexity.

The datasets and checkpoints for all 8 environments have been publicly released:

📃Paper: https://t.co/x7ABoXMMaU
📘Page: https://t.co/3PwvKkUfuQ
🐙Code: https://t.co/vYMw6uLK24
📠Dataset: https://t.co/xskCOjYN5w
🤗Checkpoints: https://t.co/YC2SzKHXa2

Also shout out to @YongxinChen1 , Yipu, Liqian, Zelin, @lamawm7 @YuchenZhu_ZYC

4

60

15

33

8K

22 days ago

@ShenyuanGao Congrats Shenyuan!!

1

0

235

23 days ago

@nilaksh404 Interesting study! Do you think it is due to inherent disadvantages of reconstruction-based world model or design choices of VAE? In our work, we do find reconstruction-based world model can still work well. But happy to learn more about your takes!

0

82

Xinyu Zhang @XinyuZhang82004

23 days ago

Glad to see that people are using data from our Interactive World Simulator and building better model within months of our release!!! Definitely an exciting time for world model!

25 days ago

𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐚 𝐰𝐨𝐫𝐥𝐝 𝐦𝐨𝐝𝐞𝐥 𝐢𝐧 𝐚 𝐥𝐚𝐭𝐞𝐧𝐭 𝐬𝐩𝐚𝐜𝐞 𝐜𝐚𝐩𝐭𝐮𝐫𝐢𝐧𝐠 𝐬𝐭𝐚𝐭𝐞 𝐞𝐯𝐨𝐥𝐮𝐭𝐢𝐨𝐧, 𝐮𝐬𝐞 𝐢𝐭 𝐟𝐨𝐫 𝐰𝐨𝐫𝐥𝐝 𝐚𝐜𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥 𝐚𝐧𝐝 𝐯𝐢𝐬𝐮𝐚𝐥 𝐫𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠. 🚀 Introduce RLA-WM, a simple and efficient state-of-the-art world model. ✂️ RLA-WM decouples a world action model from the video backbone, and enables the first demonstration 🎬 of visual reinforcement learning entirely inside our world model, learned only from videos (𝚆̲orld 𝙼̲odel‑based 𝚁̲𝙻̲). ⚡ Talk is cheap, open the notebook in Colab ▶️ to run RLA-WM and WMRL in a single T4 GPU! 🌐 Website 📄 Paper: https://t.co/5t6zaHaH1g ▶️ Colab: https://t.co/wTQfSXBPn6 👏 Our work, "𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗩𝗶𝘀𝘂𝗮𝗹 𝗙𝗲𝗮𝘁𝘂𝗿𝗲-𝗕𝗮𝘀𝗲𝗱 𝗪𝗼𝗿𝗹𝗱 𝗠𝗼𝗱𝗲𝗹𝘀 𝘃𝗶𝗮 𝗥𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗟𝗮𝘁𝗲𝗻𝘁 𝗔𝗰𝘁𝗶𝗼𝗻", is a collaborative effort by computer vision and robotics researchers from Rutgers University, Purdue University, and the University of Wisconsin‑Madison. Shoutout to my amazing collaborators! @XuZhengtong , Yutian Tao, @YepingWang , @yushe_1 , @ABoularias

XinyuZhang82004's tweet photo. 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐚 𝐰𝐨𝐫𝐥𝐝 𝐦𝐨𝐝𝐞𝐥 𝐢𝐧 𝐚 𝐥𝐚𝐭𝐞𝐧𝐭 𝐬𝐩𝐚𝐜𝐞 𝐜𝐚𝐩𝐭𝐮𝐫𝐢𝐧𝐠 𝐬𝐭𝐚𝐭𝐞 𝐞𝐯𝐨𝐥𝐮𝐭𝐢𝐨𝐧, 𝐮𝐬𝐞 𝐢𝐭 𝐟𝐨𝐫 𝐰𝐨𝐫𝐥𝐝 𝐚𝐜𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥 𝐚𝐧𝐝 𝐯𝐢𝐬𝐮𝐚𝐥 𝐫𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠.

🚀 Introduce RLA-WM, a simple and efficient state-of-the-art world model. ✂️ RLA-WM decouples a world action model from the video backbone, and enables the first demonstration 🎬 of visual reinforcement learning entirely inside our world model, learned only from videos (𝚆̲orld 𝙼̲odel‑based 𝚁̲𝙻̲).

⚡ Talk is cheap, open the notebook in Colab ▶️ to run RLA-WM and WMRL in a single T4 GPU!

🌐 Website 📄 Paper: https://t.co/5t6zaHaH1g
▶️ Colab: https://t.co/wTQfSXBPn6

👏 Our work, "𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗩𝗶𝘀𝘂𝗮𝗹 𝗙𝗲𝗮𝘁𝘂𝗿𝗲-𝗕𝗮𝘀𝗲𝗱 𝗪𝗼𝗿𝗹𝗱 𝗠𝗼𝗱𝗲𝗹𝘀 𝘃𝗶𝗮 𝗥𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗟𝗮𝘁𝗲𝗻𝘁 𝗔𝗰𝘁𝗶𝗼𝗻", is a collaborative effort by computer vision and robotics researchers from Rutgers University, Purdue University, and the University of Wisconsin‑Madison. Shoutout to my amazing collaborators!

@XuZhengtong , Yutian Tao, @YepingWang , @yushe_1 , @ABoularias

0

61

7

43

15K

2

20

0

12

3K

YXWangBot retweeted

Xinyu Zhang @XinyuZhang82004

25 days ago

𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐚 𝐰𝐨𝐫𝐥𝐝 𝐦𝐨𝐝𝐞𝐥 𝐢𝐧 𝐚 𝐥𝐚𝐭𝐞𝐧𝐭 𝐬𝐩𝐚𝐜𝐞 𝐜𝐚𝐩𝐭𝐮𝐫𝐢𝐧𝐠 𝐬𝐭𝐚𝐭𝐞 𝐞𝐯𝐨𝐥𝐮𝐭𝐢𝐨𝐧, 𝐮𝐬𝐞 𝐢𝐭 𝐟𝐨𝐫 𝐰𝐨𝐫𝐥𝐝 𝐚𝐜𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥 𝐚𝐧𝐝 𝐯𝐢𝐬𝐮𝐚𝐥 𝐫𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠. 🚀 Introduce RLA-WM, a simple and efficient state-of-the-art world model. ✂️ RLA-WM decouples a world action model from the video backbone, and enables the first demonstration 🎬 of visual reinforcement learning entirely inside our world model, learned only from videos (𝚆̲orld 𝙼̲odel‑based 𝚁̲𝙻̲). ⚡ Talk is cheap, open the notebook in Colab ▶️ to run RLA-WM and WMRL in a single T4 GPU! 🌐 Website 📄 Paper: https://t.co/5t6zaHaH1g ▶️ Colab: https://t.co/wTQfSXBPn6 👏 Our work, "𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗩𝗶𝘀𝘂𝗮𝗹 𝗙𝗲𝗮𝘁𝘂𝗿𝗲-𝗕𝗮𝘀𝗲𝗱 𝗪𝗼𝗿𝗹𝗱 𝗠𝗼𝗱𝗲𝗹𝘀 𝘃𝗶𝗮 𝗥𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗟𝗮𝘁𝗲𝗻𝘁 𝗔𝗰𝘁𝗶𝗼𝗻", is a collaborative effort by computer vision and robotics researchers from Rutgers University, Purdue University, and the University of Wisconsin‑Madison. Shoutout to my amazing collaborators! @XuZhengtong , Yutian Tao, @YepingWang , @yushe_1 , @ABoularias

0

61

7

43

15K

YXWangBot retweeted