Happy to share a recent work I've been a part of -- TensorTouch! 🖐️
TensorTouch enables the calibration of optical tactile sensors for dense stress tensor AND deformation for dexterous manipulation.
website: https://t.co/GKGkazgPzP
code: https://t.co/pr25wmyYCE
1/4
Had a great time at #CVPR2026 spreading the news of free gifts. Some of my hopes for autonomy and robotics are in scaling RGB data (with inductive 3D bias in feedforward 3D encoders), and predicting future states and actions, which seemed to be a solid recipe for autonomous driving.
The stakes are higher with robotics -- learning how objects interact under contact is a different breed than predicting where to drive and avoid an accident, but the runway is very, very long.
Code and pretrained model coming soon (within 2 weeks) at https://t.co/8ZYGjLalEd!
hmm I think it's useful to understand the inner working of policies before we really see robots safely in homes/factories/logistics/medical etc. And I'd just be really curious to understand what is implicitly learned in these large action models. Maybe it's not 3D, so then what is it?
Incredibly impressed by the Wuji 👋hand at ICRA. Morphologically similar to my own hand while being quasi direct drive and with climber 🧗grip strength. All that at a fraction of the cost of Sharpa hand. Can’t wait to try it out!
Epic work. I feel that feedforward reconstruction models for robotics have good appeal because they take RGB as input, which is easy to scale, but they still encode 3D (and it seems 4D as well) ego motion and geometry.
I wonder if adding the register tokens works super well in the real world. Can such models be extended to encode future scene progression, which maybe gives robot learning models latent future tokens to use? Let's see!
Introducing VGGT-Ω: scaling feed-forward reconstruction across static and dynamic scenes, and studying whether the learned geometric representations transfer beyond reconstruction.
@wildmindai Nice work. But it would be great for repos and works like these to publish a “try it out on huggingface” demo. That way, people can quickly see if a model is suitable for their own data ASAP!!
Folks going to CVPR — if the Avs get to the finals, there will be games to go to during the conference nights. Even if you’re not into hockey, this team is so fun to watch.
A test of humans’ internal object trackers too (5v5 with a tiny puck), perfect for vision people.
@JitendraMalikCV Add to this camera pose generalization. At different camera poses, the robot should be able to have a degree of spatial understanding to know how to complete the task. Humans can do it quite easily, so should robots.
Super cool, would be curious to see this for robotics models in the real world like VLA! I feel there are some strong 3d implicit priors in these models to really dissect.
Neural networks might speak English, but they think in shapes.
Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision.
Starting today, we’re releasing a series of posts on this research agenda. 🧵
For years, everyone had to collect a bunch of demos in the real world when it came to tactile data.
This work is a nice first step in the direction of simulating optical tactile sensors with unprecedented photorealism, reducing the visual gap in the real world with accurate physics modeling. Real2Sim won't just be constrained to just vision anymore. Congrats to team!!!
How do you build accurate simulations for optical tactile sensors?
Learn the physics with a differentiable MPM simulator.
Learn the optics with a fully learned renderer.
The result: SOTA tactile sensor simulation enabling zero-shot sim-to-real across a wide range of tasks. 🚀
How do you build accurate simulations for optical tactile sensors?
Learn the physics with a differentiable MPM simulator.
Learn the optics with a fully learned renderer.
The result: SOTA tactile sensor simulation enabling zero-shot sim-to-real across a wide range of tasks. 🚀
Unreal achievement for humanity. Even if robots beat it, absolutely marvelous to see someone with an average mile time that is most people's full-out sprint speeds. I'll be celebrating if I can hit their pace for even 1 mile.
Getting closer and closer to a theoretical limit.
Another important point to note: they should also measure a straight line from the shoulders to weights, watch for two stage lifts (where a lifter uses back then legs, instead of in one stage), and, in general, the trajectory of the body during the lift. Solid stuff though!
TensorTouch is accepted to T-RO!!!
We have larger and better pretrained models as well. Contact us if interested in the sensor and calibrated (deformation, normal force, and more) model!
Happy to share a recent work I've been a part of -- TensorTouch! 🖐️
TensorTouch enables the calibration of optical tactile sensors for dense stress tensor AND deformation for dexterous manipulation.
website: https://t.co/GKGkazgPzP
code: https://t.co/pr25wmyYCE
1/4
Great to see J-PARSE (https://t.co/FMlABGjrgL) being used in this project.
IMO, it’s a rite of passage in robotics to get wrecked by a singularity. Not anymore.
Our 3D Vision team (3DGR) is releasing Raiden — a data collection toolkit for YAM robots.
Built for scalable, high-quality data: supports leader–follower + SpaceMouse teleop, multi-camera setups, and modern stereo depth (incl. TRI learned stereo).
https://t.co/I4vXvVPuC4
We hope that LFG sets a new standard to pretrain 3D aware encoders on unlabelled data. The pretraining paradigm of LFG can be extended to a 3D world model, conditioned on images and derived delta actions of a driving car, predicting the future state. The code, including pretrained models, is coming soon (within 3 weeks!)!
A great collaboration with @AppliedInt. And thanks to @yyfz321021 for Pi3. Nothing happens without it!
There are a LOT of egocentric YouTube driving videos online, but no labels/annotations to learn rich spatial and semantic representations for autonomous driving.
In our @CVPR new paper -- Learning to Drive is a Free Gift -- LFG (yes, you heard that right), we show that many teachers can be employed to pretrain a model with spatial, semantic, and even motion awareness. This model can jointly predict camera poses, point maps, semantics, confidence, and motion maps on current AND future frames, from only YouTube videos.
LFG's encoder also outperforms SOTA multi-camera and LiDAR baselines with only a single monocular camera on the challenging NAVSIM baseline.
Find out about the free gifts below or check out the website https://t.co/hN3uweY30k🛣️: