Rob Lee @roblee_rl - Twitter Profile

Pinned Tweet

about 1 year ago

IMLE Policy introduces a new way to train faster and more data efficient behavior cloning policies. Will be presented at RSS2025! https://t.co/y8ZTPxwdLf 🧵⬇️

1

14

4

11

2K

Rob Lee @roblee_rl

2 days ago

perpetual cup stacking @sydekickbot

0

6

0

655

Rob Lee @roblee_rl

30 days ago

@joudanki 😂me making coffee every morning

0

1

0

24

Rob Lee @roblee_rl

about 1 month ago

@kevinmpeterson1 @JCChristopher Nice. Do you see a future where you get to deployment with the modular approach, and then train end-to-end with the data from the real deployment data later on? Does that data scale well in construction settings?

0

19

Rob Lee @roblee_rl

about 1 month ago

less than 1 hour of data. i love policy eval timelapses :)

4

57

4

22

6K

Rob Lee @roblee_rl

about 1 month ago

@Saketh_Vaishya it's the flexiv rizon https://t.co/WlKzeOSCH9

0

1

0

176

Rob Lee @roblee_rl

about 1 month ago

a good model trained on even a simple task with a tiny amount of data feels mesmerising, no matter how many times you see it.

15

161

13

50

23K

Rob Lee @roblee_rl

about 1 month ago

@Saketh_Vaishya it's an end-to-end learned policy!

1

0

132

roblee_rl retweeted

Chris Paxton

@chris_j_paxton

about 1 month ago

Seeing a model work like this for the first time is such a good feeling

2

70

5

14

8K

Rob Lee @roblee_rl

about 1 month ago

@sentientcar We use both! This arm is 7dof, has a larger workspace, and a higher payload, which is nice for certain applications

0

1

0

272

Rob Lee @roblee_rl

about 2 months ago

@carlosdponx @randallmbriggs @GoingBallistic5 Westwood Robotics BEAR actuators also have liquid cooling, used in their humanoids

0

40

Rob Lee @roblee_rl

about 2 months ago

@ed0henderson Or more simply, the loss might look like it has plateaued, but the model might still be tweaking the smaller precise parts of the movements.

1

0

27

Rob Lee @roblee_rl

about 2 months ago

@ed0henderson Imitation learning is weird because there's no great way to pick checkpoints other than eval perf. It's hard to pinpoint overfitting since human demos are noisy/multimodal/not iid. Especially with small datasets the val set might not be perfectly representative either.

1

0

33

Rob Lee @roblee_rl

3 months ago

@ZakharovSergeyN Awesome work!

0

2

0

470

Rob Lee @roblee_rl

3 months ago

@Goodeat258 Nice, thanks! Do you plan to release the code for the robotics experiments? Curious how you create the positive/negative samples for imitation learning (since theres only one label per conditioning)

0

386

Rob Lee @roblee_rl

5 months ago

@asimovinc This might be useful, you only need a few simple terms for nice gaits (single foot contact, airtime, simple penalties) https://t.co/buQRfjoMhx

0

80

Rob Lee @roblee_rl

5 months ago

@JieWang_ZJUI Interesting. I guess their contribution is more around training data/recipe?

1

0

156

Rob Lee @roblee_rl

6 months ago

Right, that section shows averaging the outputs of a flow policy doesn't hinder performance much. They also show a figure with very minor spread of modes. In my experiments I found similar behavior, but there are definitely states where diffusion will output multiple modes. In most cases though, you can still get good success rate while collapsing modes, because you will often move to a state with less action ambiguity. (https://t.co/tjyxR01q7f) It's highly dependent on task and dataset though

roblee_rl's tweet photo. Right, that section shows averaging the outputs of a flow policy doesn't hinder performance much. They also show a figure with very minor spread of modes. In my experiments I found similar behavior, but there are definitely states where diffusion will output multiple modes. In most cases though, you can still get good success rate while collapsing modes, because you will often move to a state with less action ambiguity. (https://t.co/tjyxR01q7f) It's highly dependent on task and dataset though

1

0

57

Rob Lee @roblee_rl

6 months ago

Great work! Really interesting paper. I'm curious about what you think about recent non-iterative generative policies (C1+C2) like https://t.co/z26zf4YI4f and https://t.co/nrjjOAf9cN. These methods are basically regression but with additional mechanisms that encourage better use of the noise space. It seems that either C1+C2 and C2+C3 can work well, but I wonder about the trade offs.

1

2

0

547

Rob Lee

@roblee_rl

Last Seen Users on Sotwe

Trends for you

Most Popular Users