Today, we present a step-change in robotic AI @sundayrobotics.
Introducing ACT-1: A frontier robot foundation model trained on zero robot data.
- Ultra long-horizon tasks
- Zero-shot generalization
- Advanced dexterity
π§΅->
Congrats Chuning and collaborators! Really nice to see work on doing joint generative modelling over all (or at least more) of our available robot data.
On a personal note, I remember pushing for this high-level approach to be articulated in the robotics section of the foundation models paper, so a small pat on the back for past me (and maybe a cookie).
Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)
FlowMo is a SOTA image tokenizer based on diffusion autoencoders -- no convolutions, adversarial training, or distillation! Personally, I expect to see similar approaches from frontier labs soon: diffusion is much more scalable than convnet GANs.
Modern generative models of images and videos rely on tokenizers. Can we build a state-of-the-art discrete image tokenizer with a diffusion autoencoder? Yes! Iβm excited to share FlowMo, with @kylehkhsu, @jcjohnss, @drfeifei, @jiajunwu_cs. A thread π§΅:
Check out our work on LLM personalization led by Anikait and Sheryl!
Personally (hehe), it's really cool to see how ideas in automated few-shot task construction I explored way back before my PhD remain applicable in the era of foundation models.
Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! π§΅
Incredibly saddened to hear of @FelixHill84's passing. His work and talk on "Environmental drivers of systematicity and generalization in a situated agent" (https://t.co/uBxU2FNsRZ) directly inspired my first successful PhD project.
Rest in peace, Felix.
Is it possible to obtain zero-shot generalization of vision-based RL agents without data augmentation, task-centric representations, etc? Yes! by disentangling the latent space and, you mightβve heard about this given the latest news, associative memory i.e. Hopfield models!π§΅
[n/n] For more details, including implementation, fully-fledged background, derivations, decoded latent intervention visualizations, etc., see
code: https://t.co/DEdz8gjQYe
paper: https://t.co/jZQ9q53Jup
π¨ NEW state-of-the-art model for unsupervised disentanglement π¨
[1/n] Tripod melds three complementary methods for disentangled representation learning that each target a separate component of an autoencoder.
[14/n] Jubayer and I will be presenting this work at ICML this Thursday, 11:30 am to 1 pm local time, Hall C 4-9 #409. Come chat about disentanglement!