Long Le @LongLeRobot - Twitter Profile

Pinned Tweet

3 months ago

Check out my most recent project that I've been working on for more than half a year -- co-led with Tim @TimSong52005757! We showed a flexible and effective recipe for improving generalist VLAs leveraging non-robotic foundation models! The key is guidance in 3D! More details @ https://t.co/aBauZS5TiT

Kostas Daniilidis

@KostasPenn

3 months ago

VLA policies learn generalist robot behaviors from massive teleoperation datasets, hoping that the right behavior emerges. But they rarely use perception during training or inference: powerful foundation models of 3D geometry, semantics, or human motion are ignored. @TimSong52005757, @LongLeRobot, and our @GRASPlab team introduce Omniguide, based on a simple idea: Instead of retraining policies, let perception guide action at inference time. We express diverse guidance sources as attractive and repulsive energy fields in 3D space and inject their gradients into the generative process of VLA policies. This lets perception modules steer actions without retraining the policy: A modular path toward composable robot intelligence. Project: https://t.co/5Vqvz1t8TR Paper: https://t.co/IoYzKrOzKM

2

158

39

100

47K

0

30

8

6

7K

Long Le @LongLeRobot

21 days ago

@RayLi234 Wow, very cool!

0

1

0

335

Long Le @LongLeRobot

27 days ago

@Huangyu58589918 @GoogleDeepMind yes!!! I’ll probably can catch you in the office!

0

5

0

489

Long Le @LongLeRobot

27 days ago

Personal news: I’m joining @GoogleDeepMind NYC this summer as a Student Researcher on the Robotics team! 4 years ago, I left Google to start my PhD. Now I get to come back and work on humanoid robot learning. Full loop closure, as the SLAM nerds would say. NYC folks—say hi!

LongLeRobot's tweet photo. Personal news: I’m joining @GoogleDeepMind NYC this summer as a Student Researcher on the Robotics team!

4 years ago, I left Google to start my PhD. Now I get to come back and work on humanoid robot learning.

Full loop closure, as the SLAM nerds would say. NYC folks—say hi! https://t.co/O3M0nhJYM7

49

953

23

104

42K

Who to follow

AI @ Tesla Optimus | Robotics/AI PhD @CMU_Robotics, EECS @UCBerkeley | Opinions entirely and unequivocally my own

Romil Bhardwaj

@bromil101

Building SkyPilot @skypilot_org | PhD AI+Systems @Berkeley_EECS @ucbrise

LongLeRobot retweeted

Kevin Zakka @kevin_zakka

2 months ago

Really excited to release mjviser, a web-based MuJoCo viewer, powered by Viser. It has almost all the features of the native MuJoCo viewer, but runs in your browser. Load and simulate any MuJoCo model with a single uv command 👇 uvx mjviser <model.xml>

20

320

38

129

22K

LongLeRobot retweeted

Jinqi Luo

@peterljq

2 months ago

Really exciting work! This is exactly promising for test-time controllability. Agents shall perceive to guide and correct behavior actively.

0

10

2

1K

LongLeRobot retweeted

Dinesh Jayaraman @dineshjayaraman

3 months ago

Excited about this project: we showed how, for challenging tasks, a VLA can "get by with a little help from its friends"😉 --- powerful perception models that infer geometry and more, by constructing 3D guidance fields for diffusion and flow policies.@LongLeRobot @TimSong52005757

0

44

5

25

7K

LongLeRobot retweeted

Chris Paxton

@chris_j_paxton

3 months ago

It was a lot of fun to see this in person. We see so much online, but theres nothing nearly as convincing as just doing a demo, first try, right in front of someone. And its cool work too, addressing a serious shortcoming of current policies

1

41

9

12

6K

Long Le @LongLeRobot

3 months ago

For this project, we’ve also optimized the speed significantly and shown real-time demos to several visitors coming to Penn including @chris_j_paxton and @DJiafei. Here’s a video of the robot finishing a task in 20 secs autonomously

Kostas Daniilidis

@KostasPenn

3 months ago

VLA policies learn generalist robot behaviors from massive teleoperation datasets, hoping that the right behavior emerges. But they rarely use perception during training or inference: powerful foundation models of 3D geometry, semantics, or human motion are ignored. @TimSong52005757, @LongLeRobot, and our @GRASPlab team introduce Omniguide, based on a simple idea: Instead of retraining policies, let perception guide action at inference time. We express diverse guidance sources as attractive and repulsive energy fields in 3D space and inject their gradients into the generative process of VLA policies. This lets perception modules steer actions without retraining the policy: A modular path toward composable robot intelligence. Project: https://t.co/5Vqvz1t8TR Paper: https://t.co/IoYzKrOzKM

2

158

39

100

47K

2

62

12

32

14K

LongLeRobot retweeted

Robots Digest 🤖

@robotsdigest

3 months ago

Scaling VLAs with more robot data is just not enough. OmniGuide shows you can fix generalist policies at inference time. Add guidance fields in 3D space that attract toward goals and repel from obstacles, and steer the policy without retraining.

1

71

16

48

4K

LongLeRobot retweeted

Lingjie Liu @LingjieLiu1

3 months ago

"Last-mile precision" for VLAs! Check out the cool work led by the amazing Tim @TimSong52005757 and Long @LongLeRobot!

0

25

1

8

4K

LongLeRobot retweeted

TimSong @TimSong52005757

3 months ago

So honored to have the support of my professors and peers, including the omnipotent Long @LongLeRobot, on my first PhD project. 3D space is the bridge between the task and action space, where the guidance from foundation knowledge flows.

2

33

8

7

4K

LongLeRobot retweeted

Jie Wang @ CVPR

@JieWang_ZJUI

3 months ago

Here is our recent project to enhance generalist policies with auxiliary information guidance! Steering your base policy, so it’s more performant and effective

0

21

2

11

3K

LongLeRobot retweeted

Kostas Daniilidis

@KostasPenn

3 months ago

VLA policies learn generalist robot behaviors from massive teleoperation datasets, hoping that the right behavior emerges. But they rarely use perception during training or inference: powerful foundation models of 3D geometry, semantics, or human motion are ignored. @TimSong52005757, @LongLeRobot, and our @GRASPlab team introduce Omniguide, based on a simple idea: Instead of retraining policies, let perception guide action at inference time. We express diverse guidance sources as attractive and repulsive energy fields in 3D space and inject their gradients into the generative process of VLA policies. This lets perception modules steer actions without retraining the policy: A modular path toward composable robot intelligence. Project: https://t.co/5Vqvz1t8TR Paper: https://t.co/IoYzKrOzKM

2

158

39

100

47K

LongLeRobot retweeted

Jie Wang @ CVPR

@JieWang_ZJUI

3 months ago

TAMP vs End2End, which one is better? Check out our latest research ablating these two on tabletop pick-and-place setup, it turns out the SOTA foundation models provide very good prior that solve this task family. Please enjoy the download-and-play TipTop from MIT folks!

1

16

4

6

3K

LongLeRobot retweeted

Will Liang

@willjhliang

3 months ago

Introducing Tether 🪢, a fun little idea to scale data by having our robot “play” in the real world for over 24 hours, throughout the day and overnight—improving policies from zero to mastery with minimal supervision! But play is messy, with out-of-distribution scenarios that are hard to anticipate. To perform autonomous functional play in the real world, from just a handful of demos, we propose a highly robust few-shot imitation method that warps demo trajectories using visual correspondences. Then, continuously running it within a multi-task VLM-guided cycle, we generate a data stream that produces 1000+ expert-level demos. This generated data is finally funneled downstream to train imitation learning policies, which improve from zero to near-perfect success rates. We’ll be presenting Tether at #ICLR2026 in just a few weeks! But before that, deep dive with me… 🧵

7

272

44

161

45K

LongLeRobot retweeted

Jiafei Duan@CVPR2026

@DJiafei

4 months ago

Why do generalist robotic models fail when a cup is moved just two inches to the left? It’s not a lack of motor skill, it’s an alignment problem. Today, we introduce VLS: Vision-Language Steering of Pretrained Robot Policies, a training-free framework that guides robot behavior in real time. Check out the project: https://t.co/9xE68JPLUv 👇🧵 (Watch till the end: VLS runs uncut, steering pretrained policies across long-horizon tasks.)

2

204

37

163

68K

LongLeRobot retweeted

Edward Hu @edward_s_hu

6 months ago

Happy to announce our neurips’25 paper, real world RL of active perception behaviors! I am pretty excited about this project - I learned that real world robot RL is actually quite straightforward. Details below:

4

206

25

99

25K

Long Le

@LongLeRobot

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users