The bottleneck to on-robot reinforcement learning is good, scalable reward prediction. Robometer is a massive step in that direction, and the authors have been wonderfully open as well, releasing a large dataset and continuing to improve their model post release.
Thanks to @aliangdw@yigitkkorkmaz and @Jesse_Y_Zhang for joining me and @DJiafei!
Let's solve dexterity. Check out this competition Michael and company are putting together: can you train a robot to fold origami? Data and remote evals are provided. Sign up now.
@waynenilsen it is a data thing to a certain extent.. but inherently it’s the fact that robots in general need to work with existing systems, brownfield facility chaos, and highly unstructured env.. the entire make or break lies in the long tail and getting from 90% to 99%..
Humans have a great understanding of 3d and our relationship with the world, something that's broadly lacking in world models. This has severe implications for out of distribution generalization and object avoidance. PointWorld is a really interesting approach to address this - a 3D world model which represents changes in the world as point flow.
Thanks to @wenlong_huang for coming on the podcast and telling us about it. Learn more here ->
Spatial understanding is important to moving around in complex environments and is a huge part of the challenge of generalizing to new scenes. Most world models, however, largely ignore this spatial dimension, focusing on 2D images.
Not PointWorld, though. PointWorld is a 3D world model trained from real and simulated data which can perform a wide variety of manipulation tasks on a real robot, including grasping or handling articulated objects, all without any additional fine tuning. @wenlong_huang joins us to tell us more about what makes this work and how it’s different from other world models.
Watch Episode #83 of RoboPapers, with @chris_j_paxton and @DJiafei, to learn more!
I'm not sure id consider any "robot foundation model" company a world models co, unless its also training world models...
The way people use terms changes and thats fine. Silly to be upset about eg world labs being included because they dont learn a conditional dynamics model. But some of these just arent world models at all
It is however the hot thing right now and the post is mostly a good overview imo
With LLMs, the model is the unit of value. In robotics, the unit is the full tuple:
- model,
- robot,
- task,
- environment.
Change any one and the result can flip completely.
Code is only ~30% of what makes a robot work. The other 70% (calibration, sim configs, deployment tuning) has no home today.
Most people assume our lamp form factor is just an aesthetic choice, but it is actually a direct response to the exact deployment problems outlined here.
Positioning as a lamp allows us to:
- tap into existing distribution channels
- deliver value on day 1 (without relying on perfect autonomy)
- get into homes fast -> starts the data flywheel
Humanoids can't do this because they require near-perfect physical ai to be a viable consumer product.
This means until physical ai is solved, there will be limited real-world adoption -> limited deployment data -> limited improvement.
I wrote a lot on this internally at @bySyncere. Will share more soon.
two camps in robotics:
wishful thinkers ➡️ fancy demos, zero deployment scars
hardened deployers ➡️ hard lessons, earned insights, real customers
@YorkYang5050 and @DynaRobotics welcome to the hardened deployers camp
Making humanoids is one thing. Making a humanoid into a consumer product takes engineering to extreme limits.
We've put an intense amount of thought and effort into answering every question like "how does my NEO show up at my door" with beautiful accessories and features.
Still work to be done. But it's the most exciting work of this generation IMO.