Suchae Jeong @Suchaeck - Twitter Profile

Suchaeck retweeted

2 days ago

World models are increasingly central to how agents learn and plan. Today we're releasing WorldModelGym, a benchmark built around a single question: if an agent uses a world model to choose among actions, does it pick the right one? We call this decision-based fidelity. 100+ tracks across Atari, Meta-World, DeepMind Control, and classic control. One frozen policy. Reality scores it. Read the full post → https://t.co/OzVd1n6Vth

2

265

43

168

47K

Suchaeck retweeted

byeongguk jeon

@bkjeon1211

2 days ago

Excited to share RoboWorld 🤖 We roll out generalist robot policies from 4,186 real initial scenes, entirely inside a video world model with no robots, and the rankings hit Pearson r = 0.989 (Spearman ρ = 0.970) with the real RoboArena leaderboard. 🧵 [1/7]

1

64

16

29

9K

Suchae Jeong

@Suchaeck

3 days ago

10K+ episodes. 164 hours. 60 tasks. One of the biggest missing pieces for Human-centric Physical AI has been large-scale, high-quality human-robot interaction data. HABIT brings together over 164 hours of real human-robot interaction episodes across 60 tasks, providing a valuable resource for developing and evaluating collaborative robot intelligence. Big congrats to @j__aehwi for leading this project. It was a pleasure to contribute to this work.

Jaehwi Song

@j__aehwi

3 days ago

Introducing HABIT — a large-scale robot manipulation dataset for human-present environments, where a person shares the workspace and interacts with the robot in every episode. 60 tasks · 10,563 episodes · 164 hours of rich human-robot interaction. Toward robots that are not just capable, but safe and socially compatible around people. https://t.co/kEtkqbuoIn 🧵[1/7]

7

187

30

119

16K

0

2

0

53

Suchae Jeong

@Suchaeck

8 days ago

This is really cool work addressing one of the hardest problems in robotics: cross-embodiment transfer! We propose the achieved Cartesian Delta as a unified action space for Robot Foundation Models and use an online conversion from achieved Cartesian Delta to real Cartesian commands. These methods show significant improvements in transfer across embodiments and robots, as well as robustness to changes in control frequency and object weight. If you've ever been frustrated by policy degradation when transferring between robots—even those with the same embodiment—this paper is definitely worth checking out. It offers valuable insights into addressing the cross-embodiment transfer gap.

Haeone Lee @ ICML

@Haeone_Lee

8 days ago

🤖 How can we learn a reliable policy across different robots and dynamics? Excited to introduce SPACE, a framework that significantly improves cross-embodiment and cross-hardware (e.g., DROID) learning by addressing dynamics gaps, with execution-time adaptation. 📄 paper: https://t.co/zgByzVwFyz 📷 Project website: https://t.co/ZytiBGmMrY 🧵[1/n] #Robotics #CrossEmbodiment .

1

44

14

30

10K

0

3

0

106

Suchaeck retweeted

Haeone Lee @ ICML

@Haeone_Lee

8 days ago

🤖 How can we learn a reliable policy across different robots and dynamics? Excited to introduce SPACE, a framework that significantly improves cross-embodiment and cross-hardware (e.g., DROID) learning by addressing dynamics gaps, with execution-time adaptation. 📄 paper: https://t.co/zgByzVwFyz 📷 Project website: https://t.co/ZytiBGmMrY 🧵[1/n] #Robotics #CrossEmbodiment .

1

44

14

30

10K

Suchae Jeong

@Suchaeck

about 1 month ago

We are releasing: ✅ XVR dataset ✅ Data generation pipeline ✅ Training code ✅ Evaluation code We hope XVR helps advance multi-view reasoning for both VLMs and VLAs. Project: https://t.co/g7do2Hbcl4 Paper: https://t.co/LWb8hsvYdY Huge thanks to all my collaborators, especially co-first author @Jay019374 and my advisor @kimin_le2, for making this work possible. 🙌

0

69

Suchae Jeong

@Suchaeck

about 1 month ago

🚀 Our paper "Learning Multi-View Spatial Reasoning from Cross-View Relations (XVR)" has been accepted to #CVPR2026! Current VLMs can reason from a single view surprisingly well, but they still struggle to connect information across multiple viewpoints. To address this, we introduce XVR: • 100K-sample VQA dataset • 3 categories, 8 tasks • Designed specifically for cross-view spatial reasoning Most excitingly, cross-view reasoning transfers to robot manipulation. Using an XVR-trained VLM as a VLA backbone improves RoboCasa manipulation success rates by +13%p on average. Project page: https://t.co/g7do2Hbcl4 Paper: https://t.co/LWb8hsvYdY 🍿 More details below

Suchaeck's tweet photo. 🚀 Our paper "Learning Multi-View Spatial Reasoning from Cross-View Relations (XVR)" has been accepted to #CVPR2026!

Current VLMs can reason from a single view surprisingly well, but they still struggle to connect information across multiple viewpoints.

To address this, we introduce XVR:
• 100K-sample VQA dataset
• 3 categories, 8 tasks
• Designed specifically for cross-view spatial reasoning

Most excitingly, cross-view reasoning transfers to robot manipulation.
Using an XVR-trained VLM as a VLA backbone improves RoboCasa manipulation success rates by +13%p on average.

Project page: https://t.co/g7do2Hbcl4
Paper: https://t.co/LWb8hsvYdY

🍿 More details below

5

13

4

0

2K

Suchae Jeong

@Suchaeck

about 1 month ago

XVR also improves upstream reasoning performance. Our XVR-trained Qwen3-VL-2B outperforms much larger proprietary models, including GPT-5, Gemini-2.5-Pro, and Claude-4.5-Sonnet on XVR-Eval. We also observe consistent gains on external spatial reasoning benchmarks such as MindCube-Tiny and RoboSpatial-Home.

Suchaeck's tweet photo. XVR also improves upstream reasoning performance.

Our XVR-trained Qwen3-VL-2B outperforms much larger proprietary models, including GPT-5, Gemini-2.5-Pro, and Claude-4.5-Sonnet on XVR-Eval.

We also observe consistent gains on external spatial reasoning benchmarks such as MindCube-Tiny and RoboSpatial-Home.

0

48

Suchae Jeong

@Suchaeck

about 1 month ago

The result we're most excited about: Cross-view reasoning improves robot manipulation. Using an XVR-trained VLM as the backbone of a VLA model improves manipulation success rates by an average of +13 percentage points across RoboCasa tasks. Better spatial understanding → Better control.

Suchaeck's tweet photo. The result we're most excited about:

Cross-view reasoning improves robot manipulation.

Using an XVR-trained VLM as the backbone of a VLA model improves manipulation success rates by an average of +13 percentage points across RoboCasa tasks.

Better spatial understanding → Better control.

0

30

Suchae Jeong

@Suchaeck

about 1 month ago

To teach cross-view reasoning, we built XVR. 📊 100K VQA samples 📌 3 reasoning categories: • Correspondence • Verification • Localization covering 8 different tasks. Every question requires information from multiple viewpoints to solve.

Suchaeck's tweet photo. To teach cross-view reasoning, we built XVR.

📊 100K VQA samples
📌 3 reasoning categories:
• Correspondence
• Verification
• Localization

covering 8 different tasks.

Every question requires information from multiple viewpoints to solve. https://t.co/cBllbmiDxy

0

32

Suchaeck retweeted

Suchae Jeong

@Suchaeck

about 1 month ago

🚀 Our paper "Learning Multi-View Spatial Reasoning from Cross-View Relations (XVR)" has been accepted to #CVPR2026! Current VLMs can reason from a single view surprisingly well, but they still struggle to connect information across multiple viewpoints. To address this, we introduce XVR: • 100K-sample VQA dataset • 3 categories, 8 tasks • Designed specifically for cross-view spatial reasoning Most excitingly, cross-view reasoning transfers to robot manipulation. Using an XVR-trained VLM as a VLA backbone improves RoboCasa manipulation success rates by +13%p on average. Project page: https://t.co/g7do2Hbcl4 Paper: https://t.co/LWb8hsvYdY 🍿 More details below

5

13

4

0

2K

Suchae Jeong

@Suchaeck

about 1 month ago

Why do current VLMs struggle with multiple views? Although modern VLMs perform well on single-view reasoning, they often fail when information must be combined across viewpoints. We found that existing datasets provide very limited supervision for learning cross-view relations.

Suchaeck's tweet photo. Why do current VLMs struggle with multiple views?

Although modern VLMs perform well on single-view reasoning, they often fail when information must be combined across viewpoints.

We found that existing datasets provide very limited supervision for learning cross-view relations. https://t.co/7IS52o8AD2

0

51

Suchae Jeong

@Suchaeck

Last Seen Users on Sotwe

Trends for you

Most Popular Users