Daniel Ho @itsdanielho - Twitter Profile

Daniel Ho

@itsdanielho

3 months ago

@karansdalal thanks Karan!

0

2

0

1K

Daniel Ho

@itsdanielho

3 months ago

A personal update: After two years at 1X, I’m moving on to something new. I joined 1X to solve general-purpose robotics, through the lens of evaluation. We bet on humanoid world models early in 2024. I’m proud of our work showing how the 1X World Model can solve the offline evaluation problem: judging policy quality by accurately predicting expected state and reward within the test-time distribution. We've then showcased how these same world models can leverage their understanding of robot manipulation to act as policies, generalizing far beyond the tasks in training data. Scaled deployments of robots in homes requires confidence in policy performance in unknown environments, and generalization across environments and skills. To my colleagues at 1X, It's been an honor working with you all. I'm inspired by the world-class team and humanoid that we’ve assembled and continue to assemble. In California, we’ve grown from a few robotics researchers in a one room office to a campus for large-scale manufacturing and research engineering. I’ve now joined the founding team at Project Prometheus as a member of technical staff. I've also moved up to San Francisco! Reach out if you'd like to grab coffee and chat AI in the physical world.

36

375

3

96

34K

itsdanielho retweeted

Junfan Zhu 朱俊帆 ✈️ CVPR

@junfanzhu98

4 months ago

Tuned into @itsdanielho (@1x_tech) on @RoboPapers podcast geeking out over 1XWM—inspiring! "Dream success first, then reverse-engineer the actions" paradigm is 🔥 and lol it applies to non-robots too! My takes↓ 1️⃣ World Model perfectly predicted the future action, and it was extremely close to reality, due to action-conditioned video generation (on precise low-level action sequences). At execution time, Inverse Dynamics Model (IDM) back-infers actions to ensure the “dreamed perfect trajectory” can be grounded in reality. controllable + grounded + zero-shot 2️⃣ Egocentric large-scale mid-training is useful because diversified data expands distribution coverage. Scalable and low-cost. 3️⃣ Granular training (second-by-second). Use VLM for caption upsampling, from coarse task to second-by-second play-by-play. Similar to Sora fine-grained prompt engineering, but more applicable to robot control. Granularity makes the world model capture causal chains, not just spectacle. 4️⃣ Both success and failure videos are used to train the world model. Success videos reinforce correct physics, failure videos provide negative examples. This makes imagination robust: the model can generate diverse futures (including bad ones), and a value function selects the best. 5️⃣ World model evaluating world model (recursive eval) is interesting. Current 1XWM can do self-eval (model-evaluating-model): Generate multiple rollout videos; Use internal value function or visual signals to estimate success probability; Execute highest-scoring trajectory. A more advanced loop may be: Use WM rollouts as synthetic data to predict success rate for ablating training data; Retrain/improve WM; Offline policy optimization (Dreamer-style million dream iterations). Instead of directly learn policy and rely on real rollouts for eval (expensive), using World Model to do dream-time eval / in-simulation assessment can be scalable to break through the data wall and generalize exponentially. 6️⃣ Inverse Dynamics Model (IDM) is a bridging component to translate World Model video sequences into executable low-level robot actions. It's cerebellum/translator. Given adjacent generated frames, it infers the action commands required to transition from frame A to B. World Model generates multiple rollouts with stochastic sampling, then IDM performs frame-to-frame inversion to recover action sequence and candidate trajectories, applying rejection sampling to discard some dreams where inferred actions violate kinematic constraints and ask WM regenerates. Training IDM separately is more efficient (on smaller precise data), while WM is pretrained on massive data (strong generalization). This architecture enables video prior + grounded embodiment. Instead of directly VLA End-to-End action prediction, WM + IDM "imagine-then-invert" paradigm is like "dream success first, then reverse-engineer actions", with higher visual alignment in zero-shot long-horizon tasks and easier offline evals. 👉🏻https://t.co/1VUvkAIPoY

0

19

4

9

7K

Daniel Ho

@itsdanielho

4 months ago

Check out this @RoboPapers pod for an overview of the past year of our world model research @1x_tech! We're very excited about world model architectures to achieve truly generalizable robot policies and evaluators. NEO will be able to zero-shot tasks in homes, learn rapidly with autonomy data, and predict how freshly baked models perform. This will usher in the era of home robots.

RoboPapers

@RoboPapers

4 months ago

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company @1x_tech has a solution: world models. 1X Director of Evaluations @itsdanielho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with @micoolcho and @chris_j_paxton, now!

6

59

7

41

28K

1

37

5

16

13K

Who to follow

Fei Xia

@xf1280

Ex Research Scientist, TLM at @GoogleDeepMind, ✨♊, Gemini & Robotics, PhD from @StanfordAILab @StanfordSVL, previously @Tsinghua_Uni. #AGI through Embodiment

Paul-Edouard Sarlin

@pesarlin

Researcher at @Google, 3D computer vision & machine learning. Previously PhD at ETH Zurich, intern at @Google, @Meta, @Microsoft, @magicleap.

Huihan Liu

@huihan_liu

CS PhD @UTAustin | 🤖 Robot Learning & Embodied Agent | @berkeley_ai @AIatMeta @MSFTResearch | 🏆 RSS Best Paper Finalist | 🏆 ICRA Outstanding Learning Paper

itsdanielho retweeted

RoboPapers

@RoboPapers

4 months ago

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company @1x_tech has a solution: world models. 1X Director of Evaluations @itsdanielho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with @micoolcho and @chris_j_paxton, now!

6

59

7

41

28K

Daniel Ho

@itsdanielho

4 months ago

Get ready for the @1x_tech world model RoboPapers pod drop!

RoboPapers

@RoboPapers

4 months ago

Full episode dropping soon! Geeking out with @itsdanielho on 1X World Model https://t.co/D3VO0BsAEC Co-hosted by @micoolcho @chris_j_paxton

0

36

4

22

14K

4

44

4

10

6K

Daniel Ho

@itsdanielho

5 months ago

@RealBrayden hopefully we can have people come interact with the model soon!

0

35

Daniel Ho

@itsdanielho

5 months ago

World model based polices like 1XWM we shared yesterday enables preference feedback during post-training and also test-time compute, because the model generates interpretable state One of the unlocks from this new type of architecture below the headlines

Jack Monas

@JackMonas

5 months ago

One of many next steps at @1x_tech: preference learning for world-model-based policies. Given a generated starting frame, we can sample multiple video rollouts from our WM and use preference feedback to steer the model toward higher-quality behavior. This lets us fix policy failures in synthetic worlds—resolving bad NEO behaviors with generated dogs before we ever meet real ones.

8

198

23

59

20K

4

64

5

15

8K

Daniel Ho

@itsdanielho

5 months ago

@robotryer With preference alignment and RLHF on world models, that opens up the opportunity to train custom models for each personality and owner

1

4

0

581

Daniel Ho

@itsdanielho

5 months ago

@_joe_harris_ in our blog post we show side-by-side comparisons between generations and real rollouts for a bunch of tasks: https://t.co/uAGWCKShhS Next up we will speed up model inference and minimize latency and re-plan when conditions drift

0

2

0

71

Daniel Ho

@itsdanielho

5 months ago

Excited to share our latest work on world models as robot policies! NEO executes novel manipulation tasks from text prompts, deriving actions from text-conditioned video generation. We found strong alignment between world model generations and real rollouts, and sufficient controllability to control NEO accurately. 1/n

1X

@1x_tech

5 months ago

NEO’s Starting to Learn on Its Own

297

3K

411

1K

6M

12

137

9

12

14K

Daniel Ho

@itsdanielho

5 months ago

@fatemi_michael thanks Michael!

0

1

0

126

itsdanielho retweeted

Peter Liu

@peterliuposts

5 months ago

One of the coolest examples we found is NEO holding up a peace sign WM both understands what a peace sign is and is self aware (no hands in starting frame) + the IDM extracts finger level actions :)

3

17

4

2

2K

Daniel Ho

@itsdanielho

5 months ago

@peterliuposts ✌️

0

3

0

119

Daniel Ho

@itsdanielho

5 months ago

@PlutonianGray @radbackwards thanks Kevin! excited for you to have your NEO

1

0

56

Daniel Ho

@itsdanielho

5 months ago

@peterliuposts ur legendary peterliuposts!

0

4

0

60

Daniel Ho

@itsdanielho

5 months ago

@christyjestin @ridcursion Because NEO’s embodiment is so close to human form, we found promising zero-shot transfer even without overlap on the task-specific data. For example we have 98.5% pick and place and tested transfer which wroked well

1

2

0

32

Daniel Ho

@itsdanielho

5 months ago

@PotEl0000 @btfdNOID @1x_tech Good question, you’re correct that our current world model work doesn’t solve these delayed and higher level tasks. Stay tuned for orchestration work where we solve things like this!

0

1

0

56