MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use.
MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate %
More Below:
Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. ๐ค
230k+ indoor scenes, 130k+ object models, & 42M annotated robotic graspsโall in one ecosystem.
@chris_j_paxton I think people should start adding "state-based" in the corner of video demos if they're not visual. similar to how it's now standard to add the playback speed (eg. x4). especially if there's manipulation involved, otherwise it might be deceiving
We are open-sourcing Wall-OSS-0.5.
Pretrain Once, Act Anywhere.
Wall-OSS-0.5 is a VLA model for real-world robotic manipulation, exploring whether pretraining alone can produce robot capabilities directly testable on physical hardware before task-specific fine-tuning.
Key technical highlights:
โข Gradient-bridged co-training
โข Vision-Aligned RVQ Action Tokenizer
โข Action-Space Supervision
โข DMuon distributed optimizer
In zero-shot real-robot evaluation, the pretrained checkpoint achieved task-progress scores above 80 on multiple tasks, including Block Sorting, Fruit Sorting, Ring Stacking, and Rope Tightening.
Paper, code, blog, and uncut videos: https://t.co/YzSdxg3RAH
Robotics is still data starved. Collecting high-quality robot demonstrations remains brutally slow and expensive.
Introducing COBALT: A cloud-native teleoperation platform designed for large-scale robot learning.
We are democratizing data collection by leveraging the hardware everyone already owns: the smartphone
All you need is to download an app (today)!
Read on for more!
@TX_Leo_Wang Congrats on the release. quick question, did you ablate feeding the policy the spatial observations (ict tokens) only without any rgb? judging by the "human rgb" vs "human rgb + ict" gap it seems like that's all what's being used?
๐ง๐ถ๐ฃ๐ง๐ผ๐ฃ ๐ถ๐ #๐ญ ๐ผ๐ป ๐ ๐ผ๐น๐บ๐ผ๐ฆ๐ฝ๐ฎ๐ฐ๐ฒ๐! Outperforming VLAs including MolmoAct2 and ฯโ.โ , and WAMs like DreamZero
It's the only method that uses inference-time search and ๐ฏ๐๐ง๐ค robot data. We didn't do any benchmark-specific tuning.
Just merged an amazing contribution by @omarrayyann to mjlab's viser viewer: checkpoint hot-swapping! You can now browse and load any checkpoint mid-session without restarting and it works with local checkpoints and W&B runs.
Benchmarking, evaluating, and developing robotics code is difficult, and part of this is because no simulator really reflects the diversity and scale of real embodiments. Enter MolmoSpaces from AI2: a massive open ecosystem with a range of 230,000 handcrafted and procedurally-generated home environments, including 48,000 manipulable objects. Crucially, MolmoSpaces provides simulation environments which work for both navigation and manipulation. We talked to the team: @YejinKim4, @omarrayyann, and Max Argus, to tell us more.
Watch Episode 69 of RoboPapers, with @micoolcho and @DJiafei, now!
Jensen approves!
Hercules efforts from @YejinKim4@omarrayyann, Max Argus & team! This has a decent chance of becoming a super important benchmark fo robotics going forward.
Check out this @RoboPapers episode with the MolmoSpaces folks.
Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.๐งต
MolmoSpaces leaderboard is now open for submissions!
When we created this benchmark for zero-shot real-to-sim eval in diverse homes, we didnโt expect things to heat up so quickly. But it did, thanks to @jang_yoel and team at GEAR toppling PI to take the crown on task-general category. Congrats ๐
You can evaluate and submit your model to this leaderboard: https://t.co/Ysc0XQEMdr