Matt Strong

Verified account

@MattStr10050461

Stanford CS PhD | 3D vision for robotics and autonomy

Joined December 2020

143 Following

75 Followers

84 Posts

Pinned Tweet

@MattStr10050461

about 1 year ago

Happy to share a recent work I've been a part of -- TensorTouch! 🖐️ TensorTouch enables the calibration of optical tactile sensors for dense stress tensor AND deformation for dexterous manipulation. website: https://t.co/GKGkazgPzP code: https://t.co/pr25wmyYCE 1/4

3

11

3

0

898

@MattStr10050461

26 days ago

Had a great time at #CVPR2026 spreading the news of free gifts. Some of my hopes for autonomy and robotics are in scaling RGB data (with inductive 3D bias in feedforward 3D encoders), and predicting future states and actions, which seemed to be a solid recipe for autonomous driving. The stakes are higher with robotics -- learning how objects interact under contact is a different breed than predicting where to drive and avoid an accident, but the runway is very, very long. Code and pretrained model coming soon (within 2 weeks) at https://t.co/8ZYGjLalEd!

MattStr10050461's tweet photo. Had a great time at #CVPR2026 spreading the news of free gifts. Some of my hopes for autonomy and robotics are in scaling RGB data (with inductive 3D bias in feedforward 3D encoders), and predicting future states and actions, which seemed to be a solid recipe for autonomous driving.

The stakes are higher with robotics -- learning how objects interact under contact is a different breed than predicting where to drive and avoid an accident, but the runway is very, very long.

Code and pretrained model coming soon (within 2 weeks) at https://t.co/8ZYGjLalEd!

0

13

0

3

1K

@MattStr10050461

about 1 month ago

hmm I think it's useful to understand the inner working of policies before we really see robots safely in homes/factories/logistics/medical etc. And I'd just be really curious to understand what is implicitly learned in these large action models. Maybe it's not 3D, so then what is it?

0

2

0

0

169

@MattStr10050461

about 1 month ago

Epic. Hopefully a hand that can gently pick up blueberries AND unscrew those annoyingly tight bottles.

Aiden Swann @SwannAiden

about 1 month ago

Incredibly impressed by the Wuji 👋hand at ICRA. Morphologically similar to my own hand while being quasi direct drive and with climber 🧗grip strength. All that at a fraction of the cost of Sharpa hand. Can’t wait to try it out!

0

11

0

0

446

0

1

0

0

68

@MattStr10050461

about 2 months ago

Epic work. I feel that feedforward reconstruction models for robotics have good appeal because they take RGB as input, which is easy to scale, but they still encode 3D (and it seems 4D as well) ego motion and geometry. I wonder if adding the register tokens works super well in the real world. Can such models be extended to encode future scene progression, which maybe gives robot learning models latent future tokens to use? Let's see!

about 2 months ago

Introducing VGGT-Ω: scaling feed-forward reconstruction across static and dynamic scenes, and studying whether the learned geometric representations transfer beyond reconstruction.

14

723

121

242

779K

0

20

4

14

4K

@MattStr10050461

about 2 months ago

@wildmindai Nice work. But it would be great for repos and works like these to publish a “try it out on huggingface” demo. That way, people can quickly see if a model is suitable for their own data ASAP!!

0

0

0

0

101

@MattStr10050461

about 2 months ago

Folks going to CVPR — if the Avs get to the finals, there will be games to go to during the conference nights. Even if you’re not into hockey, this team is so fun to watch. A test of humans’ internal object trackers too (5v5 with a tiny puck), perfect for vision people.

about 2 months ago

THE AVALANCHE WIN IT IN OVERTIME AND ARE OFF TO THE WESTERN CONFERENCE FINAL 🗣️ #StanleyCup

NHL's tweet photo. THE AVALANCHE WIN IT IN OVERTIME AND ARE OFF TO THE WESTERN CONFERENCE FINAL 🗣️ #StanleyCup https://t.co/xFzIERsQoR

146

6K

697

58

144K

0

2

0

0

129

@MattStr10050461

about 2 months ago

@aurel_arnold Also try out J-PARSE -- we found it to be a much more suitable teleoperation method than DLS! https://t.co/NblF7ONYmc

0

3

0

1

128

@MattStr10050461

about 2 months ago

@JitendraMalikCV Add to this camera pose generalization. At different camera poses, the robot should be able to have a degree of spatial understanding to know how to complete the task. Humans can do it quite easily, so should robots.

0

0

0

0

343

@MattStr10050461

about 2 months ago

Super cool, would be curious to see this for robotics models in the real world like VLA! I feel there are some strong 3d implicit priors in these models to really dissect.

about 2 months ago

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

310

11K

2K

9K

3M

4

30

0

5

2K

@MattStr10050461

2 months ago

For years, everyone had to collect a bunch of demos in the real world when it came to tactile data. This work is a nice first step in the direction of simulating optical tactile sensors with unprecedented photorealism, reducing the visual gap in the real world with accurate physics modeling. Real2Sim won't just be constrained to just vision anymore. Congrats to team!!!

MattStr10050461's tweet photo. For years, everyone had to collect a bunch of demos in the real world when it came to tactile data.

This work is a nice first step in the direction of simulating optical tactile sensors with unprecedented photorealism, reducing the visual gap in the real world with accurate physics modeling. Real2Sim won't just be constrained to just vision anymore. Congrats to team!!!

Aiden Swann @SwannAiden

2 months ago

How do you build accurate simulations for optical tactile sensors? Learn the physics with a differentiable MPM simulator. Learn the optics with a fully learned renderer. The result: SOTA tactile sensor simulation enabling zero-shot sim-to-real across a wide range of tasks. 🚀

1

6

1

0

302

0

2

0

0

83

MattStr10050461 retweeted

Aiden Swann @SwannAiden

2 months ago

How do you build accurate simulations for optical tactile sensors? Learn the physics with a differentiable MPM simulator. Learn the optics with a fully learned renderer. The result: SOTA tactile sensor simulation enabling zero-shot sim-to-real across a wide range of tasks. 🚀

1

6

1

0

302

@MattStr10050461

2 months ago

Unreal achievement for humanity. Even if robots beat it, absolutely marvelous to see someone with an average mile time that is most people's full-out sprint speeds. I'll be celebrating if I can hit their pace for even 1 mile. Getting closer and closer to a theoretical limit.

2 months ago

The sub 2 hour marathon barrier has been broken in London Sabastian Sawe: 1:59:30 Yomif Kejelcha: 1:59:41 4:34/mile for 26.2 miles... insane

526

41K

2K

2K

14M

0

2

0

0

54

@MattStr10050461

3 months ago

Another important point to note: they should also measure a straight line from the shoulders to weights, watch for two stage lifts (where a lifter uses back then legs, instead of in one stage), and, in general, the trajectory of the body during the lift. Solid stuff though!

0

5

0

0

1K

@MattStr10050461

3 months ago

TensorTouch is accepted to T-RO!!! We have larger and better pretrained models as well. Contact us if interested in the sensor and calibrated (deformation, normal force, and more) model!

@MattStr10050461

about 1 year ago

Happy to share a recent work I've been a part of -- TensorTouch! 🖐️ TensorTouch enables the calibration of optical tactile sensors for dense stress tensor AND deformation for dexterous manipulation. website: https://t.co/GKGkazgPzP code: https://t.co/pr25wmyYCE 1/4

3

11

3

0

898

0

5

1

0

170

@MattStr10050461

3 months ago

Great to see J-PARSE (https://t.co/FMlABGjrgL) being used in this project. IMO, it’s a rite of passage in robotics to get wrecked by a singularity. Not anymore.

Sergey Zakharov

@ZakharovSergeyN

3 months ago

Our 3D Vision team (3DGR) is releasing Raiden — a data collection toolkit for YAM robots. Built for scalable, high-quality data: supports leader–follower + SpaceMouse teleop, multi-camera setups, and modern stereo depth (incl. TRI learned stereo). https://t.co/I4vXvVPuC4

5

180

32

101

38K

0

5

0

1

1K

@MattStr10050461

3 months ago

We hope that LFG sets a new standard to pretrain 3D aware encoders on unlabelled data. The pretraining paradigm of LFG can be extended to a 3D world model, conditioned on images and derived delta actions of a driving car, predicting the future state. The code, including pretrained models, is coming soon (within 3 weeks!)! A great collaboration with @AppliedInt. And thanks to @yyfz321021 for Pi3. Nothing happens without it!

0

0

0

0

51

@MattStr10050461

3 months ago

There are a LOT of egocentric YouTube driving videos online, but no labels/annotations to learn rich spatial and semantic representations for autonomous driving. In our @CVPR new paper -- Learning to Drive is a Free Gift -- LFG (yes, you heard that right), we show that many teachers can be employed to pretrain a model with spatial, semantic, and even motion awareness. This model can jointly predict camera poses, point maps, semantics, confidence, and motion maps on current AND future frames, from only YouTube videos. LFG's encoder also outperforms SOTA multi-camera and LiDAR baselines with only a single monocular camera on the challenging NAVSIM baseline. Find out about the free gifts below or check out the website https://t.co/hN3uweY30k🛣️:

1

4

0

0

100

@MattStr10050461

3 months ago

It is competitive with other methods across driving quality metrics as well, with only a monocular camera.

MattStr10050461's tweet photo. It is competitive with other methods across driving quality metrics as well, with only a monocular camera. https://t.co/Og0r9yCAT0

1

0

0

0

26

Last Seen Users on Sotwe

Trends for you

Most Popular Users