📣 Upcoming Event: AIRoA Open Lab
AIRoA is planning an "Open Lab" for prospective applicants at our Heiwajima office (Tokyo, Japan) on Thursday, June 25th, starting around 7:00 pm.
This event will provide an opportunity to view the cutting-edge development environment featured in our videos up close and to engage in direct discussions with the key members leading our development efforts.
Further details and application instructions will be announced soon on our official website and social media channels.
We are truly excited to meet those who share our passion for shaping the future of Physical AI. Please save the date and stay tuned for further updates!
https://t.co/k5FEl8pLCq
🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM).
But why does it work? How can we improve it further?
Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments.
😮Key insight: it’s all about representations.
- Alignment → enables transfer
- Discernibility → enables adaptation
⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment.
⬇️Let’s take a deep dive into that:
Paper: https://t.co/RWCAxdBC0j
Website: https://t.co/BwgbwCkevA
🧐 Simulation has long promised robot pretraining, but breaks at the moment of real-world deployment.
🚀 Today, we introduce SIM1: the first real-to-sim-to-real paradigm where the generative world becomes the same one as reality.
SIM1 produces simulation data whose execution is directly valid in the physical world, enabling policies trained entirely in simulation to transfer zero-shot, at scale.
📈 This unlocks a new scaling law for robotics: we scale intelligence without scaling real-world data.
✨ Few demonstrations in, real-world policies out.
Simulation is no longer a proxy; it is supervision itself.
https://t.co/Kp1YBe5Gmf
https://t.co/GG2SBQfPpG
Robotics: coding agents’ next frontier.
So how good are they?
We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability.
From @NVIDIA@Berkeley_AI@CMU_Robotics@StanfordAILab
https://t.co/MVcc6XWQhY
🧵
Excited to introduce PolaRiS, a real-to-sim recipe for turning short real-world videos into high fidelity simulation environments for scalable and reliable zeroshot generalist policy evaluation.
https://t.co/nWcR6YuPf4
(1/N 🧵)
Fantastic talk from @RichardSSutton at @RL_Conference with shoutouts to meta-RL. Honored to be called “more extreme” than Rich (by Rich) for taking the Bitter Lesson to heart and suggesting we meta-learn all the components he discussed.
My Q: Aren’t LLMs already doing all this?
🚀 [SIGGRAPH 2025] Thrilled to share StiffGIPC - our latest work advancing GPU IPC for stiff affine-deformable simulations! Another 2–10× speedup without sacrificing accuracy 💥
Huge shoutout to @KemengHuang and all collaborators 🙌
https://t.co/G4y4vIBnKT
How to use simulation data for real-world robot manipulation? We present sim-and-real co-training, a simple recipe for manipulation. We demonstrate that sim data can significantly enhance real-world performance, even with notable differences between the sim and the real. (1/n)
📢📢📢 Excited to release ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning (CVPR25).
🤏🤙✌️With ManipTrans, we can transfer dexterous manipulation skills into robotic hands in simulation and deploy them on a real robot, using a residual policy learned for dex manipulation.
🤖🤖🤖The video below illustrates how the MoCap data can be transferred to Inspire, Shadow, Xhand, Allegro, and Mano.
With ManipTrans, we can scale up dex manip data greatly with minimal effort. For more details, please check our
-webpage: https://t.co/0SHz9doX1z
-code: https://t.co/h6KEKM1rmQ
-huggingface: https://t.co/bn75V10fSA
HOVER https://t.co/m7nSTvJz6l
is now open-source https://t.co/4lwF6cYRcp
IsaacLab, docker images, data processing, etc. it's all there. Check it out! OmniH2O also part of this repo
Introducing our recent progress to utilize synthetic data—RE3SIM—a Real-to-Sim-to-Real pipeline that integrates Gaussian splatting with NVIDIA Isaac Sim's PhysX engine, improving scene reconstruction and sim-to-real transfer for robotic manipulation tasks.
Project: https://t.co/MQUnXgy76A
Highlights:
- High-fidelity geometry and vision: small sim-to-real gaps in both geometry and visual aspects.
- Highly efficient data collection: scene reconstruction in ~2.5 minutes and simulation data at 100 episodes per 10 minutes.
- Zero-shot sim-to-real transfer: limited simulation data brings high success rates.
Key Observation:
- Scaling law: Increasing the simulation data scale can enhance the success rate until it converges at a high-performance level.
- Mixing Sim-Real: Co-training real-world data can integrate the characteristics of both datasets.
This work was done by our talented undergraduate @xshenhan , with me, @yilun95, Junqiu Yu, Xiaoyang Lyu, Yang Tian, Bolun Wang, Weinan Zhang, and @pangjiangmiao !
very nice post! Still dissecting it a little more deeply myself but the part on non trivial optim. landscape really hits home something that has troubled me a lot
Trying to approximate a non-smooth function (or piece-wise smooth) is fairly problematic because the value function can never do well around sharp transitions. I once experimented with a fairly strange function approximation problem that i documented here: https://t.co/ceE4D4vECZ.
Essentially smooth neural nets (lots of tanh activations) are super good smooth function approximators, but fail at where the function has sharp/instantaneous changes. One solution proposed was to combine decision trees to directly figure out where that sharp change is (DTs after all are very good with tabular like data, in many ways the staged rewards is like a tabular data classification problem).
In all my PPO experiments with Maniskill the value function loss spikes essentially exactly around when the model learns to get to the next stage of the reward function. Designing reward functions with smoother transitions between stages (something we did not do consistently in ManiSkill) makes PPO more stable.
That being said this does speak to the deeper more complex nature of reward function design. I say this blog post reinforces my belief that using LLMs as a way to scale up reward function generation is still very far away (and hence limiting the use-case for dense reward RL for robotics significantly). But still the avenue of RL + sparse reward + demonstration data (less data than used by IL) has strong promise still.
Video models != world models
"We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism"
Excited to share our open-source project on building world models through differentiable robot-object interactions! Explore it here: https://t.co/U0DY9h5cLe. Huge thanks to the @nvidia Warp team (@eric_heiden@milesmacklin) for the easy-to-use infrastructure.
Check out our survey paper on humanoid robots entitled "Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning"
ArXiv link: https://t.co/cVPzlbXzg7