Stanford Multi-robot Systems Laboratory. Endowing groups of robots with the intelligence to collaborate safely and effectively with humans and each other.
π€Low-data post-training can teach a VLA policy a new robot skill. But it also makes it too attached to the training demos.
We call this lock-inπ: the policy can execute the post-training task, yet fails to respond to seemingly obvious prompt changes.
DeLock preserves steerability using only the policyβs own pretrained knowledge. No extra supervision needed!πππ
#Robotics #AI #EmbodiedAI #VLA
Ο, But Make It Fly βοΈ
We fine-tuned Ο0, a VLA model pretrained entirely on manipulators, to fly a drone that picks up objects, navigates through gates, and composes both skills from language commands.
This work was done in collaboration with Johnathan Tucker, Denis Liu, @SwannAiden, @allenzren, Javier Yu, @JiankaiSun, Brandon Kim, Lachlain McGranahan, @QuanVng, and @MacSchwager
Stay tuned for the dataset and code!
π§΅ Thread β GRaD-Nav++
1/9
Do you ever wish you could throw away the controller and just tell your drone what to do?
Like:
βGo through that gate, then stop over the ladder.β
or during midway
βActually switch tasks β fly to the monitor on the right.β
[1/2] Excited to announce GRaD-Nav! We propose a new framework that integrates 3DGS and Differentiable RL to train vision-based drone navigation policies. Our method achieves efficient end2end training, zero-shot sim2real transfer, and strong in-task adaptability.
[5/5] We show in hardware experiments that LatentToM solves tasks with two decentralized arms as well as a fully centralized bi-manual policy.
Paper: https://t.co/gsUQC9dAuq
Project: https://t.co/BFATfun1zX
[1/5] Humans collaborate with each other by simulating the state of mind of their teammates, a concept called Theory of Mind (ToM). We propose LatentToM, a method to endow robots with a theory of mind in latent space for cooperative manipulation.
[4/5] LatentToM is comms flexible. Without comms, the robots rely completely on Theory of Mind for coordination. With comms, they use a single communication round to align their consensus embeddings at each policy inference.
[5/5] We embrace these findings by proposing an Action Lookup Table (ALT) policy, which equals the diffusion policy's reactivity and dexterity with a fraction of the memory footprint and inference time. And no diffusion denoising steps!
[4/5] A visual hash function indexing a memorized action lookup table gives closed-loop visual reactivity without the need for action generalization, which seems to be a powerful recipe for imitation learning with few demonstrations.
[3/3] This learned policy is designed to adapt at runtime to variations in drone dynamics. It outputs thrust and body rate commands and runs at 20hz on a commodity drone with only onboard compute and perception.
[1/3] We're excited to announce SOUS VIDE, a method to train visuomotor navigation policies for autonomous drones without data from real-world flights, using only Gaussian Splat reconstructions of the scene.
π Paper: https://t.co/8J7SV3w3ef
π Project: https://t.co/EzEAY4jkOg
[2/3] To do this, we train a Gaussian Splatting model of a scene and virtually "fly" the drone within it with a massive volume of motion and dynamics perturbations. This produces 100k+ image-action pairs, which then supervise the training of a our learned policy.