Introducing GEN-1.
Our latest milestone in scaling robot learning.
We believe it to be the first general-purpose AI model to master simple physical tasks.
99% success rates, 3x faster speeds, adapts in real time to unexpected scenarios, w/ only 1 hour of robot data.
More🧵👇
@GeneralistAI Also this was a really impressive recovery we observed 🤯
The bottle started slipping and the robot adjusts its grip to stabilize the bottle and prevent a spill!!
In my first week at @GeneralistAI, I trained a robot to pour liquids using GEN-1 🤖💧
I wanted to challenge the robot with a non-rigid manipulation task, so liquid felt like the perfect choice. The task involved:
- unscrewing the bottle cap
- pouring liquid into espresso glasses
- rebalancing uneven pours
Best of all, the robot was able to complete the task fully autonomously 3 times in a row (out of 3)! Pour-fect 😉
Excited for the journey ahead and grateful to be building alongside such an incredible team!
Decades of hardware development led to strong, fast, and precise robot arms. The moment we can put in general intelligence in these things, we can leverage the full spectra of capabilities that they were always meant to capture.
We’re betting on a future where robot hardware will continue to improve, and we intend to build the best models on top of the best hardware to push the frontier of capabilities and reliability.
Here’s an old video from TossingBot that I think helps make this point extra clear. Industrial-grade repeatability started out as a crutch for dumb software -- but if you pair it with the right AI models, then it becomes an advantage that is superhuman. There are factories where the same UR robot arms are still being used to precisely and repeatably build the same car parts, operating for 10 yrs straight without a single failure or shutdown.
Humanoids will get there too (among other form factors). Not yet today, but eventually. And our models will be ready to meet them when they do.
GEN-1 delicately arranges potato chips, and lifts a heavy bag of potatoes — from a gentle touch to a strong grip.
Read more about Gen-1 in our blog posts in the comments below ↓
For the avid viewer -- there’s a brief moment when the robot loses it’s grip on the head of a ziptie, and so it decides to use the other hand to help readjust the grip for the pull.
It’s gnarly passing by our robots everyday, and catching these random glimpses of improvisational intelligence in action. Instant dopamine hit.
Today marks the end of my first full week @GeneralistAI
Last Monday, I was given a challenge: use our GEN-1 model to teach a robot a task of my choosing, using the same no-code platform our customers use.
I picked the ball-and-vase magic trick. It was one of my favorites as a kid, and it felt like the right mix of fun and surprisingly hard.
A few days later, GEN-1 pulled it off. I left Friday having watched the robot nail it 14 times in a row. What’s wild is that even 4 months ago, if you told me you could go from idea to on-robot skill in a couple of days, I probably wouldn’t have believed you.
Really excited to be building with an incredible team. Can’t wait to see what week two brings 🤖
If anyone’s creating a benchmark for frontier physical AI models, this task is a great one to add to the roster. Sensorimotor end-to-end policies must exhibit the long-term visual memory to track and reason about where the object might be.
It’s also harder to “cheat” on this task -- it can be difficult to do if you’ve got “gaps” in your model’s memory e.g. low-frame rate memory or coarse representations, like language.
GEN-1 nails it (also on the first try with an unseen object). @BerkayAntmen was really trying hard to fool the model here.
Shout out to an excellent task from @RhodaAI.
GEN-1 plays the 🐚 shell game, trained on just 1 hr of robot data. It also generalizes to unseen objects, like @BerkayAntmen 's car keys.
Physical AI models should be capable of benchmark tasks like this one. It's interesting for the all the reasons @RhodaAI calls out -- requires visual memory, and the model must track the cups from the very start, at high frame rates.
Interestingly, GEN-1 appears to exhibit a degree of "active perception." It's subtle; the hands can sometimes appear to "follow" the cups, using its own movements to help attend to where it thinks the object should be.
Read more about GEN-1 in our blog post in the comments below ↓
A week of contrasts... @sudo_robotics says why do IRL training? Generalist says why do sim?
Very cool to see some of these examples coming together. Putting a dollar into a wallet definitely feels like a tail task I would not have expected robotics to solve for another 5(?) years at least! Kudos @GeneralistAI
Everyday for the past 2 weeks, we've been sharing something new from GEN-1, our latest milestone in scaling robot learning. This has never been done before.
Going from ideas to skills in days (or faster) is what physical AI models should deliver.
More coming. Stay tuned.
Read more about it in our blog post in the comments below ↓
GEN-1 still works with lights off, and generalizes under harsh lighting conditions.
The model uses raw video pixels to make decisions, so strong lighting changes can drastically alter its input distribution. Yet performance still holds.
Why? GEN-1 was pre-trained on a massive, diverse dataset of different lighting conditions—everywhere from outdoor farms, to warehouses, from grocery stores, to dimly lit homes—it's already seen it all, and transfers this knowledge to new tasks.
This is a glimpse of what we call Mastery, and is part of the reason these models can cross a new performance threshold.
Read more about it in our blog post in the comments below 👇
GEN-1 Repackages Toy Eggs
Other tasks GEN-1 can do:
https://t.co/j2vm6S3y5f
Read more about GEN-1, our latest foundation model for the physical world:
https://t.co/Sg2PoaRzrF
GEN-1 plays with fidget toy.
Other tasks GEN-1 can do:
https://t.co/j2vm6S45UN
Read more about GEN-1, our latest foundation model for the physical world:
https://t.co/Sg2PoaS7hd