Impressive to see a VLA control a robot with such high DOF.
Most mobile manipulators I’ve seen have been using classical SLAM + planning for navigation and VLA/WAM for manipulation.
My biggest flaw is that I can’t hate anyone properly. My empathy keeps sabotaging my ability to be a committed hater. Eventually, I just end up rationalizing what they did. 🤕😢
One thing I’ve noticed after attending academic conferences like CVPR:
Many students still approach robotics from an algorithm-first perspective.
They think in terms of CV, RL, Control, architectures, or scaling laws—as if robotics were simply stitching these pieces together.
But the frontier is increasingly about bridging abstraction layers: from foundation models and high-level reasoning, through perception, planning, and control, all the way down to actuator-level motor commands.
One realization that really changed my perspective was understanding that state ≠ action.
Even in something as “simple” as the Shadow Hand, the state represents the robot and the world, while the action is the low-level motor command—not just another vector of joint angles.
That gap in abstraction is, I think, still one of the biggest gaps in how many students think about robotics today.
Which is also why researchers with genuine full-stack robotics experience, like @TairanHe99, have such a difficult-to-replicate advantage.
Introducing InfiniteDiffusion, my independent paper accepted to #SIGGRAPH2026!
I have one RTX 3090 Ti. No funding, advisors, or team. By day I'm a new grad SWE at Walmart.
The paper has two main contributions:
- InfiniteDiffusion: a new approach to infinite generation with diffusion models.
- Terrain Diffusion: the world’s first learned procedural terrain generator.
Here’s why this matters, and how they are connected. 🧵
Robot learning is moving beyond policies built for one robot, one scene, one task.
At MIT, we’re exploring a different path: turning video world models into embodiment-agnostic robot policies.
Introducing VERA: a 14B video-to-action system that controls robots across embodiments, skills, and environments.
From zero-shot pick-and-place on a real Panda arm to contact-rich cube reorientation with a 16-DoF robotic hand.
Different robots. Different environments. Different tasks.
Same video planner. Same weights.
We’re open-sourcing everything so you can fine-tune VERA for your own robot setup too. Deep dive in the thread:
🔗 https://t.co/hzuYZ2m5lS
🧵 (1/7)
Why diffusion denoising-based generative methods do not suffer the curse of dimensionality even though the data may lie in extremely high-dim spaces? Our new work, accepted by the JMLR: https://t.co/njMEqzH3TF reveals the not-so-surprising secret: as long as the intrinsic dimension of the distribution is very low, the generative process can be extremely efficient and effective! It seems that a mixture of low-rank Gaussians is a universal model for all informative real-world data. as we stipulated in a former textbook of mine: Generalized Principal Component Analysis: https://t.co/nEy8qcFN7e, published exactly ten years ago!