Rhoda AI

Verified account

@RhodaAI

Building at the frontier of embodied intelligence.

Palo Alto, California

Joined August 2025

16 Following

2.8K Followers

38 Posts

Pinned Tweet

3 months ago

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

18

208

38

105

67K

RhodaAI retweeted

7 days ago

I'll be at CVPR next week (6/3–6/7). If you’re working on or exploring opportunities in video models for robotics (research or engineering), happy to chat 🤖 We’re also hosting a Rhoda party Thursday night with many of our technical team in town. DM me for an invite 🍻

12

49

5

13

6K

about 1 month ago

How? Existing video models aren't optimized for real-time inference. Instead of fine-tuning off-the-shelf video models, we co-design inference-aware model architectures and model-aware inference optimizations from the ground up.

1

18

0

1

2K

about 1 month ago

Can a large foundation video model run as a real-time robot policy at the edge, on a single RTX 5090? • ✅ No quantization • ✅ No distillation • ✅ Full denoising (all the way from noise to clean video) We just proved it's possible. 👇🎬

11

214

32

92

36K

about 1 month ago

The future we're building toward is one where robots adapt to new tasks in seconds. At Rhoda, we tackle real-world problems through fundamental research. Full story + technical deep-dive: https://t.co/WA9oO65qzE

0

4

1

4

661

about 1 month ago

Teaching a robot a new task typically means stopping operations, collecting teleoperated demonstrations, and retraining. That process takes hours at a minimum. We wanted to know if we could collapse it to seconds — from a single human demo, on the fly, no retraining required. Early research preview: we can.

9

84

14

31

7K

about 1 month ago

How it works: we train on paired human demo and robot execution data. Because our DVA, FutureVision, has long-context visual memory built in (https://t.co/J3veqMf4Kp), we prepend the full human video into the model's context and predict robot actions closed-loop. The model watches a human do something once and understands what to do next.

about 2 months ago

Here’s something we’ve never seen done before. Real-world tasks are long and ambiguous. Solving them requires visual memory and state tracking. Most robot policies only see the last few frames. Ours doesn't. We put our DVA, FutureVision, to the perfect testbed: the shell game 🐚. The DVA nails it.

8

233

38

84

85K

1

5

0

4

2K

about 2 months ago

At Rhoda, we tackle real-world problems through fundamental research. Full story and technical deep-dive: https://t.co/Uj2kCBBkb6

0

11

0

4

2K

about 2 months ago

Here’s something we’ve never seen done before. Real-world tasks are long and ambiguous. Solving them requires visual memory and state tracking. Most robot policies only see the last few frames. Ours doesn't. We put our DVA, FutureVision, to the perfect testbed: the shell game 🐚. The DVA nails it.

8

233

38

84

85K

about 2 months ago

How? Our DVA implements robot policy as future video generation. Given the context, the model generates future videos (bottom left) predicting not just the correct cup to pick up, but even the appearance of the hidden object. Native training on long, continuous videos gives the model built-in long-context memory.

1

10

0

5

2K

2 months ago

"I don't think the world is going back to non video based pretraining." Our CEO @startupjag spoke with @bheater at @a3automate on why video is the foundation for robots that actually work in production. https://t.co/qmBWXnWEAj

0

11

0

3

1K

2 months ago

4/ At Rhoda, we solve real-world problems with fundamental research. Full story + technical deep-dive: https://t.co/7gvP2Jrlj8

1

8

0

1

985

2 months ago

1/ We are speed running industrial robotics. It took us just 19 days from the first day of data collection to filming a 2.5-hour continuous run of our model autonomously breaking down industrial containers — zero human intervention. The data efficiency of our DVA model is fundamentally changing how fast we bring robots out of the lab and into the factory. Autonomous operation with 3 hours of data collection at a customer factory.

11

168

37

58

25K

2 months ago

3/ Achieving a 100% autonomous rate in a 2.5-hour continuous run means the model needs to handle all kinds of edge cases. Whether it's pulling a drifted box back into range or re-attempting a failed flip, the model self-corrects in real-time. -> The trash is out of reach. The robot must reposition the box before attempting another grab. -> The door won't fall open. The robot recognizes a latch probably wasn't fully released and goes back to fix it. -> The first flip fails. The robot doesn't hesitate — it goes for a second attempt. -> The box has drifted too far to reach the latch. The robot pulls it back into range.

1

11

0

0

4K

3 months ago

At Rhoda, we solve real-world problems with fundamental research. Full story + technical deep-dive in our technical blog: https://t.co/Uj2kCBBkb6

0

1

0

1

670

3 months ago

Most robot demos are “golden runs”: a perfect take selected from many attempts. But real-world deployment is about Continuous Operation. Watch our DVA model tackle a real-world decanting task for 1.5 hours straight: Uncut, Zero human intervention. 🧵👇

4

45

10

9

4K

3 months ago

Trained on just 11 hours of robot data, our model is surprisingly robust, thanks to web-scale pre-training. It doesn't just avoid errors; it handles them. If the lid tears off, it finds a new way to grip. If a bearing is stuck, it shakes the bag loose. Watch our robot navigate through these corner cases: 👇

4

10

1

1

821

Last Seen Users on Sotwe

Trends for you

Most Popular Users