🤖Low-data post-training can teach a VLA policy a new robot skill. But it also makes it too attached to the training demos.
We call this lock-in🔒: the policy can execute the post-training task, yet fails to respond to seemingly obvious prompt changes.
DeLock preserves steerability using only the policy’s own pretrained knowledge. No extra supervision needed!🚀🚀🚀
#Robotics #AI #EmbodiedAI #VLA
🤔 Can we train one VLA policy to control multi-robot teams without any explicit communication?
✨ Introducing CHORUS: a single policy for decentralized, multi-embodiment collaboration
🧵⬇️
🥳Super excited that our paper GRaD-Nav++ has received the RA-L 2025 BEST PAPER!
Huge congratulations to the amazing team🤩 @QianzhongChen, Naixiang Gao, JunEn Low, Timothy Chen, @JiankaiSun, @MacSchwager!
https://t.co/7szi1JeF8i
Can't attend #ICRA2026, but happy to share that our work has won RA-L 2025 𝗕𝗘𝗦𝗧 𝗣𝗔𝗣𝗘𝗥
This work explores the alignment between language and action in drone navigation
Thanks my amazing advisor @MacSchwager and coauthors! Thanks IEEE RAS community! Full paper in threads
🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer?
Best-of-N (e.g., GRPO) and tree search share two limitations:
🔻 Verification signals are sparse
🔻 Candidates stay within the model's own distribution
We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition.
✅ Works for both post-training and inference.
Ever fine-tuned a VLA policy on a small demo dataset and it suddenly stops listening to new instructions?
This paper calls it lock-in. The model just repeats what it saw during training
like always picking bread even when you say apple
Low-data post-training quietly kills steerability
The fix?
DeLock is surprisingly simple and clever
Thanks for the thoughtful point! DeLock is not meant to replace SFT or make arbitrary unseen skills work out of the box. It aims to reduce the combinatorial burden of SFT by leveraging the pretrained backbone to connect post-trained skills with related novel instructions, so we don’t need demos for every variation.
So its effectiveness depends on both the similarity between the trained and novel tasks, and how much the VLA backbone already knows about the relevant concepts/skills.
🤖Low-data post-training can teach a VLA policy a new robot skill. But it also makes it too attached to the training demos.
We call this lock-in🔒: the policy can execute the post-training task, yet fails to respond to seemingly obvious prompt changes.
DeLock preserves steerability using only the policy’s own pretrained knowledge. No extra supervision needed!🚀🚀🚀
#Robotics #AI #EmbodiedAI #VLA
🤖Low-data post-training can teach a VLA policy a new robot skill. But it also makes it too attached to the training demos.
We call this lock-in🔒: the policy can execute the post-training task, yet fails to respond to seemingly obvious prompt changes.
DeLock preserves steerability using only the policy’s own pretrained knowledge. No extra supervision needed!🚀🚀🚀
#Robotics #AI #EmbodiedAI #VLA
How well to VLAs generalize to new prompts after SFT? If you've worked with them, you'll know the answer. The problem is the fine tuning methodology, not the model. Suning has a clever and effective solution that requires no new data, just better SFT and inference methods. 👇
I am surprised that so many pre-trained knowledge can be preserved with no additional data if you finetune VLA in a proper way! Check this solid work from Suning!
Really nice work on tackling “lock-in” in VLA policies!
VLA post-training robustness is a bottleneck, and it’s great to see a method that improves adaptability without extra supervision. DeLock looks like a promising direction.🔥
Ever post-trained a VLA and watched it ignore every novel instruction?
We call this lock-in.
Prior fixes bloat datasets with foundation model labels. 🔓DeLock is different: regularized finetuning + contrastive prompts at inference.
Result: Pretraining priors preserved.
I always feel frustrated to see the finetuned VLA policy become useless to any other task. We need generalizable, steerable VLA that can perform well on multiple tasks (all the tasks ultimately). Checkout DeLock that elegantly solve this problem!