IMLE Policy introduces a new way to train faster and more data efficient behavior cloning policies.
Will be presented at RSS2025!
https://t.co/y8ZTPxwdLf
๐งตโฌ๏ธ
@kevinmpeterson1@JCChristopher Nice. Do you see a future where you get to deployment with the modular approach, and then train end-to-end with the data from the real deployment data later on? Does that data scale well in construction settings?
@ed0henderson Or more simply, the loss might look like it has plateaued, but the model might still be tweaking the smaller precise parts of the movements.
@ed0henderson Imitation learning is weird because there's no great way to pick checkpoints other than eval perf. It's hard to pinpoint overfitting since human demos are noisy/multimodal/not iid. Especially with small datasets the val set might not be perfectly representative either.
@Goodeat258 Nice, thanks! Do you plan to release the code for the robotics experiments? Curious how you create the positive/negative samples for imitation learning (since theres only one label per conditioning)
@asimovinc This might be useful, you only need a few simple terms for nice gaits (single foot contact, airtime, simple penalties) https://t.co/buQRfjoMhx
Right, that section shows averaging the outputs of a flow policy doesn't hinder performance much. They also show a figure with very minor spread of modes. In my experiments I found similar behavior, but there are definitely states where diffusion will output multiple modes. In most cases though, you can still get good success rate while collapsing modes, because you will often move to a state with less action ambiguity. (https://t.co/tjyxR01q7f) It's highly dependent on task and dataset though
Great work! Really interesting paper. I'm curious about what you think about recent non-iterative generative policies (C1+C2) like https://t.co/z26zf4YI4f and https://t.co/nrjjOAf9cN. These methods are basically regression but with additional mechanisms that encourage better use of the noise space. It seems that either C1+C2 and C2+C3 can work well, but I wonder about the trade offs.