Se June Joo @joocjun - Twitter Profile

joocjun retweeted

29 days ago

Today, RLWRLD unveils RLDX-1 — our proprietary Robotics Foundation Model. Across all 8 public benchmarks, RLDX-1 outperforms leading SOTA models including #NVIDIA #GR00T and Physical Intelligence #π0 — delivering state-of-the-art performance among open robotics foundation models. 🎯 A 'Dexterity-First' Philosophy The industry assumes dexterity will follow once intelligence is solved. We see it the other way around. Dexterity isn't downstream of intelligence — it's the path intelligence must take to act in the physical world. Real industrial work with five-finger robotic hands depends on signals vision alone can't capture: force (torque), tactile feedback, and the precise moment of contact. 🧠 MSAT — Multi-Stream Action Transformer Where conventional VLAs collapse every input into a single transformer stream, MSAT gives each modality — vision, language, action, touch, memory — its own dedicated stream, then unifies them through joint attention. Force, tactile signals, and long-term memory are handled by purpose-built Physics and Memory modules. The result: one model that can see, feel, remember, and adapt. 📊 Performance Highlights RoboCasa Kitchen — 70.6: the first VLA model to cross the 70-point threshold GR-1 Tabletop — 58.7: +10.7 percentage points over NVIDIA GR00T N1.6 LIBERO-Plus — 86.7%: top score across 7 robustness variables Pot-to-Cup Pouring on WIRobotics ALLEX — 70.8%: nearly 2× the comparison models, which remained in the high-30% range. We're also releasing DexBench — our industry-grounded benchmark for dexterous manipulation, defined across five domains: Grasp Diversity, Spatial Precision, Temporal Precision, Contact Precision, and Context Awareness. 🔓 Open Release Three checkpoints (8.1B parameters each), live now on GitHub and Hugging Face: RLDX-1-PT — pre-training RLDX-1-MT-ALLEX — mid-training for ALLEX RLDX-1-MT-DROID — mid-training for DROID ⚙️ Built on NVIDIA's Cloud-to-Edge Stack Training and simulation on Isaac GR00T, Isaac Lab, Isaac Sim, and cuRobo. Compute on NVIDIA H100 and A100 GPUs. Edge inference on Jetson AGX Thor with TensorRT. Our collaborations with NVIDIA, AWS, and Microsoft continue across both research and deployment. �� What's Next: The 4D+ World Model Video-based world models will never surface what isn't in the pixels — contact torque, tactile signals, robot state. Our 4D+ World Model integrates these directly with vision, language, and action across the temporal dimension, predicting and generating the full physical world. RLDX-1 is the first milestone on that roadmap. 📍 Join us at Dexterity Night in San Francisco on May 13 — followed by launch events in Japan and Korea. 🔗 Explore RLDX-1 on GitHub and Hugging Face. https://t.co/kT6aX3qo8P #RLWRL #RLDX1 #PhysicalAI #RoboticsFoundationModel #VLA #Humanoid #Dexterity #FoundationModel #Robotics #AI

0

47

14

22

3K

joocjun retweeted

RLWRLD @RLWRLD_ai

about 1 month ago

Three identical boxes. A mouse is placed into one. A moment later, a go signal, and the robot has to pick. Without memory, the policy forgets which box. RLDX-1's Memory Module is built on HAMLET (#ICLR2026 in Rio 🇧🇷), integrated into the full architecture. Open-sourced in two weeks. Stay tuned. 👉 https://t.co/u1Qe48LSmn #RLDX #RLWRLD #PhysicalAI #Robotics #Dexterity #FoundationModels #Automation #Manufacturing #VLA

1

38

13

15

6K

joocjun retweeted

Seonghyeon Ye

@SeonghyeonYe

4 months ago

VLAs (from VLMs) ❌ => WAMs (from Video Models) ✅ Why WAMs? 1️⃣ World Physics: VLMs know the internet, but Video Models implicitly model the physical laws essential for manipulation. 2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer! What I want to see in 2026: 📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs. 🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space. 🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations. We recently open-sourced the checkpoints, training and inference code. Dive into the research! 👇 📄 Paper: https://t.co/jFEwebgyBH 💻 Code: https://t.co/4sZ5RoFmgB 🤗 HF: https://t.co/nPGoLYCPyq

SeonghyeonYe's tweet photo. VLAs (from VLMs) ❌ => WAMs (from Video Models) ✅

Why WAMs?
1️⃣ World Physics: VLMs know the internet, but Video Models implicitly model the physical laws essential for manipulation.
2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer!

What I want to see in 2026:
📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs.
🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space.
🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations.

We recently open-sourced the checkpoints, training and inference code.
Dive into the research! 👇
📄 Paper: https://t.co/jFEwebgyBH
💻 Code: https://t.co/4sZ5RoFmgB
🤗 HF: https://t.co/nPGoLYCPyq

5

515

64

381

75K

joocjun retweeted

Joel Jang

@jang_yoel

4 months ago

🚀 DreamZero training code is LIVE — train your own WAM (aka VAM)! 🔧 Replicate DROID from-scratch training 📊 Run evals on sim (DROID-Sim, MolmoSpaces, Polaris) & real-world (RoboArena) No 2 GB200s for real-time inference? No problem — let NVIDIA carry that burden ��. Sign up for our API and jump into prompting new tasks! (e.g. "fan the burger" 🍔, totally unseen verb/task from DROID) Coming soon: new embodiment/robot fine-tuning initialized from our DreamZero-AGIBot checkpoint. Stay tuned! 🤖 🔗 https://t.co/50wtDaDO8E

2

114

17

50

11K

Who to follow

Dongkeun Yoon

@dongkeun_yoon

PhD student @kaist_ai. Researching multilinguality in LLMs.

Sue Hyun Park

@suehpark

Research engineer @Krafton_AI. MS @kaist_ai. BBA & BE @SeoulNatlUni. #NLProc

Seungone Kim

@seungonekim

Ph.D. student @LTIatCMU and intern at @AIatMeta (FAIR) working on AI for Science evals | Prev: @kaist_ai @yonsei_u

joocjun retweeted

Thomas Zhang

@ThomasTCKZhang

6 months ago

🤖🤖Very excited to finally share our new work “Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control” Everyone in robotics does action-chunking, but why does it actually work?🤔🤔And, what can theory tell us about the properties of data we should be collecting for robotic behavior cloning? 🧵1/N

ThomasTCKZhang's tweet photo. 🤖🤖Very excited to finally share our new work “Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control”

Everyone in robotics does action-chunking, but why does it actually work?🤔🤔And, what can theory tell us about the properties of data we should be collecting for robotic behavior cloning? 🧵1/N

5

425

66

319

72K

joocjun retweeted

Pascale Fung

@pascalefung

6 months ago

Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. • VL-JEPA is the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture. • We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction. • We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding. • We highlight that our VL-JEPA model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time. by @Delong0_0 @MustafaShukor1 @TheoMoutakanni @willyhcchung Jade Lei Yu Tejaswi Kasarla @AllenBolourchi @ylecun @pascalefung https://t.co/oUnjCaMKVv

13

554

87

394

90K

Se June Joo @joocjun

8 months ago

@eddybuild Cool release eddy👍

0

1

0

177

joocjun retweeted

Sourish Jasti

@SourishJasti

8 months ago

1/ The future of general-purpose robotics will be decided by one major question: which flavor of data scales reasoning? Every major lab represents a different bet. Over the past 3 months, @adam_patni, @vriishin, and I read the core research papers, spoke with staff at the major labs, and mapped the talent pool. This has completely changed how we think about general-purpose robotics. Our paper builds intuition, step-by step, across the 2025 frontier: from architectures → evals → data → industry dynamics. Each layer reveals a different bottleneck, but they all converge on one truth—data decides everything. Our takeaways + process below👇 If you want access to our graph (sound on), comment or DM me

90

836

187

961

181K

joocjun retweeted

Saining Xie

@sainingxie

8 months ago

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

sainingxie's tweet photo. three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right.

today, we introduce Representation Autoencoders (RAE).

>> Retire VAEs. Use RAEs. 👇(1/n)

57

2K

327

1K

415K

joocjun retweeted

C. Zhang @ChongZzZhang

9 months ago

Doing so called AI+robotics 30% time debugging real robot deployment 30% time fixing simulation and looking at tensorboard or wandb 30% time meetings and all kinds of non-research activities 10% time spin my brain to get a bit intellectual contributions with AI

4

127

5

34

6K

joocjun retweeted

RLWRLD @RLWRLD_ai

9 months ago

Watch ALLEX in action. From delicate gestures to precise object handling, our humanoid shows next-level hand dexterity and Physical AI at the @OpenAI Seoul Open Event. This is how @RLWRLD_ai is redefining real-world robotics 🤖✨ #RLWRLD #OpenAI #PhysicalAI #dexterity #AIrobotics #Seoul

1

22

9

2

8K

joocjun retweeted

RLWRLD @RLWRLD_ai

9 months ago

Just saw this awesome demo by @kaysorin — really proud to share ALLEX in action at the OpenAI Seoul Open Event! Watching it move, interact, and demonstrate real-world dexterity was something special. 🤖🙌 Huge shoutout to everyone involved — pushing the boundaries of what’s possible with physical AI. #RLWRLD #OpenAI #Robotics #PhysicalAI #Dexterity #Innovation #Seoul

0

9

3

0

649

joocjun retweeted

Stone Tao

@Stone_Tao

9 months ago

Opensourcing a useful tool to calibrate camera extrinsics painlessly in a minute, no checkerboards! It's based on EasyHEC, using differentiable rendering to optimize extrinsics given object meshes+poses. Crazy that even a piece of paper works too. Code: https://t.co/CSmD2iIXuK

7

242

42

170

44K

joocjun retweeted

Jianglong Ye

@jianglong_ye

12 months ago

How to generate billion-scale manipulation demonstrations easily? Let us leverage generative models! 🤖✨ We introduce Dex1B, a framework that generates 1 BILLION diverse dexterous hand demonstrations for both grasping 🖐️and articulation 💻 tasks using a simple C-VAE model.

15

374

86

230

73K

joocjun retweeted

hyunji amy lee @hyunji_amy_lee

12 months ago

🚨 Want models to better utilize and ground on the provided knowledge? We introduce Context-INformed Grounding Supervision (CINGS)! Training LLM with CINGS significantly boosts grounding abilities in both text and vision-language models compared to standard instruction tuning.

hyunji_amy_lee's tweet photo. 🚨 Want models to better utilize and ground on the provided knowledge? We introduce Context-INformed Grounding Supervision (CINGS)! Training LLM with CINGS significantly boosts grounding abilities in both text and vision-language models compared to standard instruction tuning. https://t.co/0Ev7IxjwHP

2

123

43

16K

joocjun retweeted

Seohong Park @seohong_park

12 months ago

Q-learning is not yet scalable https://t.co/hoYUdAAeGZ I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

seohong_park's tweet photo. Q-learning is not yet scalable

https://t.co/hoYUdAAeGZ

I wrote a blog post about my thoughts on scalable RL algorithms.

To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why). https://t.co/fvIrwTMJ1o

32

1K

184

1K

169K

joocjun retweeted

Sohee Yang @soheeyang_

12 months ago

🚨 New Paper 🧵 How effectively do reasoning models reevaluate their thought? We find that: - Models excel at identifying unhelpful thoughts but struggle to recover from them - Smaller models can be more robust - Self-reevaluation ability is far from true meta-cognitive awareness

soheeyang_'s tweet photo. 🚨 New Paper 🧵
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness https://t.co/XczKFvHpUK

4

129

27

58

10K

joocjun retweeted

Younggyo Seo @younggyoseo

about 1 year ago

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

15

561

114

298

131K

joocjun retweeted

Jay Shin

@jay_shin

about 1 year ago

Technical report is finally out https://t.co/P7nbciduCW

0

18

2

1

847

joocjun retweeted

Yuke Zhu @yukez

about 1 year ago

We took a short break from robotics to build a human-level agent to play Competitive Pokémon. Partially observed. Stochastic. Long-horizon. Now mastered with Offline RL + Transformers. Our agent, trained on 475k+ human battles,��hits the top 10% on Pokémon Showdown leaderboards. No search or heuristics, just sequence modeling. Today, we're open-sourcing our Metamon platform with our algorithms, data, and environments: 🌐 https://t.co/4YrQEk2QeX We are excited to see how our work accelerates research on building generally capable AI agents, and more importantly, inspires the next generation of Pokémon trainers!

10

362

63

147

51K

Se June Joo

@joocjun

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users