Danny Driess

about 2 months ago

π0.7's training recipe builds upon Knowledge Insulation https://t.co/zAqSFvqCgz and our recent memory work https://t.co/YFBdFweBHc

3 months ago

Many real-world tasks require memory to be successful. Yet, most robots don’t have any form of memory. Today, we are going to change that. We developed a system called MEM that introduces memory into VLAs on multiple scales

5

62

12

5

6K

0

211

about 2 months ago

The most exciting aspect of modern machine learning, in my opinion, is that one can train models that just work for many tasks, without finetuning. π0.7 is a major step in that direction for robots

about 2 months ago

Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!

59

2K

309

788

449K

3

41

2

4

8K

All the power of creation right at our fingertips. Making models @bfl_ml https://t.co/xMaJiOl7Y0

about 2 months ago

and we see strong cross embodiment generalization for dexterous tasks

2

0

169

Who to follow

Oier Mees

@oier_mees

Robot Learning & Foundation Models @microsoft & External Lecturer @ETH. Prev. postdoc at @berkeley_ai. PhD @UniFreiburg. Prev. intern @NVIDIAAI.

DannyDriess retweeted

Marcel Torné @marceltornev

3 months ago

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

37

2K

293

1K

428K

DannyDriess retweeted

Kyle Vedder

@KyleVedder

3 months ago

this robustness allows the policy to do diverse, long horizon tasks in unseen environments for example, the demo kitchen was built *after* the potatoes policy was fully trained — I just wrote the high level prompt to tell it where to go look for items and it did the rest

KyleVedder's tweet photo. this robustness allows the policy to do diverse, long horizon tasks in unseen environments

for example, the demo kitchen was built *after* the potatoes policy was fully trained — I just wrote the high level prompt to tell it where to go look for items and it did the rest https://t.co/CvkU3RA015

1

34

2

0

3K

DannyDriess retweeted

3 months ago

We equipped PI policies with memory! And taught our robots to do long-horizon real world tasks such as preparing the items for a recipe, cooking a grilled cheese and cleaning the kitchen!

8

88

15

7

10K

DannyDriess retweeted

Karl Pertsch

@KarlPertsch

3 months ago

This one has been a long time coming: today we’re introducing MEM, an approach for giving VLAs short-term and long-term memory. Memory is such an obvious capability, but adding it isn’t easy (most VLAs today are memory-less). A short thread on challenges, solutions, and the new capabilities MEM unlocks for us.

8

113

11

26

9K

3 months ago

If you look at this plot here, you can see that both short- and long-term memory were important to make long-horizon tasks work well

DannyDriess's tweet photo. If you look at this plot here, you can see that both short- and long-term memory were important to make long-horizon tasks work well https://t.co/SAdErXLdGI

3 months ago

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

48

2K

265

1K

449K

0

15

2

1

969

3 months ago

One aspect I am particularly excited about is that memory enables the model to adapt its strategy while solving the task, something we can coin “in-context adaptation”. In this example, it is unclear from a single image whether the fridge opens from the left or the right. Hence, a model without memory (left) might fail to open the fridge repeatedly. In contrast, with memory (right), our model learns “in-context” that the fridge opens differently, and adjusts its strategy accordingly.

0

3

0

351

3 months ago

Many real-world tasks require memory to be successful. Yet, most robots don’t have any form of memory. Today, we are going to change that. We developed a system called MEM that introduces memory into VLAs on multiple scales

3 months ago

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

48

2K

265

1K

449K

5

62

12

5

6K

3 months ago

The key idea behind Multi-Scale Embodied Memory (MEM): use different modalities to represent memory at different time scales. 📹 For short horizon memory, we developed an efficient video encoder that lets the model remember fine-grained details about its recent interactions. 📜 For long horizon memory, we train the model to summarize events in text, allowing it to remember events for up to 15 min.

DannyDriess's tweet photo. The key idea behind Multi-Scale Embodied Memory (MEM): use different modalities to represent memory at different time scales.

📹 For short horizon memory, we developed an efficient video encoder that lets the model remember fine-grained details about its recent interactions.

📜 For long horizon memory, we train the model to summarize events in text, allowing it to remember events for up to 15 min.

1

3

0

1

445

DannyDriess retweeted

3 months ago

General-purpose AI models are behind some of the most exciting applications we now can't live without. We envision that an analogous “physical intelligence layer” built with models like π0.6 will similarly spur a new wave of applications for the physical world. We’ve recently begun working with a handful of companies that have deployed their robots to do real-world, useful things. https://t.co/udVO9fV0PH

9

738

91

361

176K

4 months ago

Project led by @verityw_ with @JagdeepBhatia8, @CatGlossop, Nikhil Mathihalli, @riadoshi21, @tangerinecoder, @KarlPertsch, @svlevine

0

1

0

255

4 months ago

Check out our latest work on steerable policies. Instead of having only language as the interface to a VLA, steerable policies follow point queries, motion traces, atomic subtasks and more, which allows us to make better use of VLMs controlling them. More in @verityw_'s thread

DannyDriess's tweet photo. Check out our latest work on steerable policies. Instead of having only language as the interface to a VLA, steerable policies follow point queries, motion traces, atomic subtasks and more, which allows us to make better use of VLMs controlling them. More in @verityw_'s thread https://t.co/4jfTjlQCB0

Will Chen @verityw_

4 months ago

How can robot policies be trained to best leverage VLMs' CoT reasoning and in-context learning for generalization? The key is Steerable Policies: vision-language-action models that can be flexibly controlled in many ways! https://t.co/GvcvmY0JD5 1/9

verityw_'s tweet photo. How can robot policies be trained to best leverage VLMs' CoT reasoning and in-context learning for generalization?
The key is Steerable Policies: vision-language-action models that can be flexibly controlled in many ways!
https://t.co/GvcvmY0JD5
1/9 https://t.co/K86U0azgyA

6

139

37

76

23K

1

8

1

3

693

4 months ago

What I like about this: If I want to explain someone how to solve a task, I rarely use language alone, I might point at things, wave in the air, without restricting myself to only one interface to communicate my intent. This work brings this idea into VLAs.

1

0

1

239

7 months ago

The idea behind significantly improving the performance on hard real-world tasks is to train a value function, condition the model on advantages computed from the value function, and running an iterative improvement loop where the model learns from it’s own data.

DannyDriess's tweet photo. The idea behind significantly improving the performance on hard real-world tasks is to train a value function, condition the model on advantages computed from the value function, and running an iterative improvement loop where the model learns from it’s own data. https://t.co/SuXyqIaMta

1

6

0

330

7 months ago

The base model powering π*0.6 is trained with Knowledge Insulation https://t.co/zAqSFvqCgz

7 months ago

Our model can now learn from its own experience with RL! Our new π*0.6 model can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes. More in the thread below.

82

2K

316

733

704K

1

11

0

1K