I’ve seen dexterous robots before, and I’ve played with robots that showed some flickers of language following and generalization,
but I'd never seen a robot that’s really good at both! This new combination makes Pi-0.7 both highly capable and very fun to play with :)
Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!
i think this will probably true for all of our releases, but this is my favorite by far -- tool use, promptability, lots of strides in performance.
if you're building in robotics and want access, please reach out! (after fine-tuning π0.5!)
I’ve seen dexterous robots before, and I’ve played with robots that showed some flickers of language following and generalization,
but I'd never seen a robot that’s really good at both! This new combination makes Pi-0.7 both highly capable and very fun to play with :)
Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
We equipped PI policies with memory!
And taught our robots to do long-horizon real world tasks such as preparing the items for a recipe, cooking a grilled cheese and cleaning the kitchen!
General-purpose AI models are behind some of the most exciting applications we now can't live without. We envision that an analogous “physical intelligence layer” built with models like π0.6 will similarly spur a new wave of applications for the physical world.
We’ve recently begun working with a handful of companies that have deployed their robots to do real-world, useful things.
https://t.co/udVO9fV0PH
This was such a fun project to work on! The highlight for me: because of the strength of our pre-trained checkpoint, we didn’t need massive datasets. Just a few hundred demos were enough to get good policies for each of these super difficult tasks 🙂
We got our robots to wash pans, clean windows, make peanut butter sandwiches, and more!
Fine-tuning our latest model enables all of these tasks, and this has interesting implications for robotics, Moravec's paradox, and the future of large models in embodied AI.
More below!
We discovered an emergent property of VLAs like π0/π0.5/π0.6: as we scale up pre-training, the model learns to align human videos and robot data!
This gives us a simple way to leverage human videos. Once π0.5 knows how to control robots, it can naturally learn from human video.
We just released results for our newest VLA from Physical Intelligence: π*0.6. This one is trained with RL, and it makes it quite a bit better: often doubles throughput, enables real-world tasks like folding real laundry and making espresso drinks at the office.
We've added pi-05 to the openpi repo: pi05-base, pi05-droid, pi05-libero. Also added PyTorch training code!🔥
Instructions and code here: https://t.co/EOhNYfpq9B
This is an updated version of the model we showed cleaning kitchens and bedrooms in April: https://t.co/t09P0nJJFv
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇
It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
We took a robot to RSS in LA running our new Gemini Robotics On-Device VLA model. People interacted with the model with new objects and instructions in a brand new environment and the results were amazing!
Excited to announce what we've been working on: Gemini Robotics On-Device, a VLA model that runs locally and shows strong performance on 3 different robot embodiments!
We're also releasing an open source MuJoCo sim for the Aloha 2 platform, and an SDK for trusted testers to use and finetune the model.
We’re bringing powerful AI directly onto robots with Gemini Robotics On-Device. 🤖
It’s our first vision-language-action model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection. 🧵