Introducing WALL-WM, our open-source World Model for embodied AI and the next piece of our open-source robotics stack.
Carving World Action Modeling at the Event Joints
Read the blog: https://t.co/50XVN1ZjaA
Why it matters
WALL-WM shifts robot world modeling from fixed-length action chunks to event-grounded video-action pretraining. It learns around events like reaching, contact, grasping, lifting, moving, and placing, so language, vision, and action align more naturally.
Why you should care
WALL-WM brings together:
•Event-grounded VLA pretraining
•Prior-aligned video-action architecture
•Wan-based video tower + randomly initialized action DiT
•Multi-view perception with sight-cone masking, tube patch masking, and Camera RoPE
•Event Mode for variable-length execution
•Unified Mode with Staircase Decoding
•DMuon for large-scale training
The goal: help robots learn what physically matters, not just what happens in the next fixed slice of time.
Code (coming soon): https://t.co/IST58Rfgpv
#opensource #EmbodiedAI
After open-sourcing Wall-OSS-0.5 and WALL-WM this week, we’re heading to #CVPR2026 in Denver to meet the embodied AI and robotics community in person.
If you’re building, researching, or simply curious about robotics, VLA, world models, robot foundation models, sim-to-real, or real-world deployment, come find us.
Where to Meet X Square Robot at @CVPR
1. Tech Talk | June 4
X Square Robot × Embodied AI Workshop
📍Location: Room 107
Topic: Event-Level World Action Model for Embodied AI
Speaker: @shalfunnn World Model Tech Lead
2. CVPR Exhibition | 📍Booth 853
June 5 | 10:00 AM-6:00 PM
June 6 | 10:00 AM-6:00 PM
June 7 | 10:00 AM-3:00 PM
3. Saturday Robotics Meetup | June 6, 5:30-9:30 PM
We’ll also be joining the @saturdayrobotic@junfanzhu98 *Research Night* gathering to share what we’ve been working on and connect with the broader robotics community.
Register: https://t.co/d9HtnT08gY
4. X-Night Afterparty | June 7, 6:30-9:00 PM
📍Downtown Denver
Join us for steak chats, technical conversations, open roles, internship discussions, and a few robotics debates we probably won’t settle in one night.
Register: https://t.co/DKIeSeiDgq
See you in Denver.
A practical step forward for real-world manipulation: an open-source world model that replaces rigid action chunking with event-grounded prediction.
It anchors planning and control to actual physical moments (reach → grasp → contact → place), giving robots more natural timing, tighter contact handling, and reliable long-horizon behavior without heavy sim-to-real tuning.
The dual-arm demo shows clean, adaptive kitchen table-setting (plates, cutlery, fruit) exactly the kind of unstructured bimanual task most labs struggle to make robust today.
For researchers and engineers, it’s immediately usable: better dexterity out of the box, variable horizons that match real physics, and full open weights/code to build on.
'Carving World Action Modeling at the Event Joints'
📌 Read the blog: https://t.co/FhpOEvBRjf
Code (coming soon): https://t.co/Ic777HY9uo
Credit: @XSquareRobot
——-
If it matters in AI or Robotics, you'll read it here first: https://t.co/9Nm01QUKlB
@XRoboHub Thank you! Really appreciate it.
We see open source as the fastest way to move embodied AI forward, and this week is just the beginning. More to share soon.
Introducing WALL-WM, our open-source World Model for embodied AI and the next piece of our open-source robotics stack.
Carving World Action Modeling at the Event Joints
Read the blog: https://t.co/50XVN1ZjaA
Why it matters
WALL-WM shifts robot world modeling from fixed-length action chunks to event-grounded video-action pretraining. It learns around events like reaching, contact, grasping, lifting, moving, and placing, so language, vision, and action align more naturally.
Why you should care
WALL-WM brings together:
•Event-grounded VLA pretraining
•Prior-aligned video-action architecture
•Wan-based video tower + randomly initialized action DiT
•Multi-view perception with sight-cone masking, tube patch masking, and Camera RoPE
•Event Mode for variable-length execution
•Unified Mode with Staircase Decoding
•DMuon for large-scale training
The goal: help robots learn what physically matters, not just what happens in the next fixed slice of time.
Code (coming soon): https://t.co/IST58Rfgpv
#opensource #EmbodiedAI
@nurvai_ai Long-horizon tasks, complex sequential reasoning, and highly unstructured everyday constraints tend to break first without fine-tuning. That’s exactly where we’re focusing to make zero-shot more reliable in real homes.
We are open-sourcing Wall-OSS-0.5.
Pretrain Once, Act Anywhere.
Wall-OSS-0.5 is a VLA model for real-world robotic manipulation, exploring whether pretraining alone can produce robot capabilities directly testable on physical hardware before task-specific fine-tuning.
Key technical highlights:
• Gradient-bridged co-training
• Vision-Aligned RVQ Action Tokenizer
• Action-Space Supervision
• DMuon distributed optimizer
In zero-shot real-robot evaluation, the pretrained checkpoint achieved task-progress scores above 80 on multiple tasks, including Block Sorting, Fruit Sorting, Ring Stacking, and Rope Tightening.
Paper, code, blog, and uncut videos: https://t.co/YzSdxg3RAH
X Square Robot today officially open-sourced Wall-OSS-0.5 under the motto "Pretrain Once, Act Anywhere."
Wall-OSS-0.5 is a Vision-Language-Action model for real-world robotic manipulation. According to the team, the pretrained checkpoint shows zero-shot generalization on multiple real-robot tasks without task-specific post-training, while also outperforming recent open-source models such as π0.5 in fair comparisons that control for data and fine-tuning scale.
The model also reports stronger embodied grounding, suggesting that action-aware training can improve robot-relevant understanding without eroding general multimodal capability.
Code and model weights are expected to be released this weekend.
Project: https://t.co/uZt4RsrnBK
Worth following for researchers and developers working on VLA, robot learning, and embodied AI.
@Em_Nomadic Thanks for sharing, Emerson. This is exactly the idea: robots shouldn’t just enter homes as tools, but gradually learn the home, the routines, and the people living there. Real families, real life, real feedback.
X Square Robot is moving its WALL-B powered home robots into real households, where the robots can learn cleaning and daily tasks directly from families.
@Elian_Frida Absolutely. That’s very much aligned with our direction. We’ll keep releasing open-source models for real-world tasks, and over time they can be adapted to different robots and hardware platforms. The goal is to let people build not just a robot, but their own robot.
Over the past month, we asked families why they would want a robot at home. The answers were not really about sci-fi.
They were about only children needing company.
Parents getting older.
People living alone.
Couples arguing over chores.
Long workdays.
Homes that still need care when no one has energy left.
Some wanted help cleaning.
Some wanted someone to check the doors, windows, gas and lights.
Some wanted companionship for a child, a parent, or themselves.
That is what makes home robots interesting to me.
Not because they are perfect today.
But because the need is already real.
Would you live with one?
Would you actually live with a robot at home? A new robot family member is starting to arrive. 🤖
35 days after Born to Bot, Bot to Family, X Square Robot is moving its next-gen home robot into real households.
It runs on WALL-B, a world model that connects vision, language, touch, action, and physical prediction for messy, unpredictable home tasks.
It can already help with parts of cleaning and tidying, but it still moves slowly, hesitates, and learns inside real homes.
More than 1,000 families have signed up. Pre-orders are open now — would you bring one home?