Building Human-embodied Intelligence. CEO @MyoLabAI | Sr. research scientist @OpenAI @GoogleAI @AIatMeta | @berkeley_ai @UWcse #MuJoCo | Ad. Prof. @CMU_Robotic
๐ขLife is a sequence of bets โ and Iโve picked my next: @MyolabAI
Itโs incredibly ambitious, comes with high risk, & carries unbounded potential. But itโs a version of the #future I deeply believe in.
I believe:
โก๏ธAI will align strongly with humanity - coz it maximizes its own growth & impact
โก๏ธIt will transform the world as profoundly as the internet
โก๏ธLike the internet, it will ultimately disappear into the background of our daily lives
Most of what we see today are transient wins - short-term products riding the first waves of capability. Not transients, Iโm betting on the signals that will endure.
Just as the cellphone became the personal gateway to the internet era, I believe the future of AI will be ๐ฉ๐๐ซ๐ฌ๐จ๐ง๐๐ฅ๐ข๐ณ๐๐, ๐๐๐ง๐ญ๐ซ๐๐ฅ๐ข๐ณ๐๐, & ๐๐๐๐ฉ๐ฅ๐ฒ ๐ก๐ฎ๐ฆ๐๐ง-๐๐๐ง๐ญ๐ซ๐ข๐. The interfaceโthe #canvasโof this era is still waiting to be defined.
With MyoLab, Iโm placing my bet on the ๐ฅ๐ข๐๐๐ฅ๐ข๐ค๐ ๐ก๐ฎ๐ฆ๐๐ง ๐๐ข๐ ๐ข๐ญ๐๐ฅ ๐ญ๐ฐ๐ข๐ง as that interface.
Weโve assembled a world-class team with the conviction and grit to make this future real. Weโre building a new kind of AI: embodied, personal, and lifelike. Most already believe lifelike digital twins are inevitable. Weโre just accelerating the timeline.
Today, weโre releasing an early research preview of the first instantiation of #HumanEmbodiedIntelligence at https://t.co/LxNB3aCEhC
Weโd love for you to try it and share your feedback.
๐๐ก๐ข๐ฌ ๐ข๐ฌ ๐ฆ๐ฒ ๐๐๐ญ. ๐๐ก๐๐ญโ๐ฌ ๐ฒ๐จ๐ฎ๐ซ๐ฌ?
All forms of intelligence co-emerged with a body, except AI
We're building a #future where AI evolves as your lifelike digital twin to assist your needs across health, sports, daily life, creativity, & beyond...
https://t.co/QL3o9YxZYz โก๏ธ Preview your first #HumanEmbodiedAI
@MyoSuite is rapidly being the frontier of #ML community developing the next era of Reinforcement Learning.
MyoSuite captures humans morphology in functional tasks. Beyond progress in algorithms, solutions to the tasks presents the potential of being significant impact on life.
Flow policies are a powerful policy class for continuous-control RL: they represent expressive, multi-modal action distributions, and they train by simple supervised regression โ sample improved target actions, then distill.
But in online MaxEnt-RL, one question decides everything:
๐ฏ Where should the supervision come from?
The usual answer is global importance sampling: sample from the policy, reweight by Q-values, distill. It works only when the proposal can reach high-value regions. In high-dimensional action spaces, that overlap disappears โ the proposal misses target-relevant actions, importance weights collapse, and supervision goes sparse.
We introduce FLAG: Flow Policy MaxEnt-RL by Latent-Augmented Guidance โฌ๏ธ
๐ท Localize improvement โ condition both the proposal and the target on the same flow latent z, so importance sampling happens in a shared local region with real overlap, exactly where improvement occurs.
๐ท No BPTT โ update the flow by distilling onto improved action labels, never by differentiating through the flow ODE.
๐ท Principled โ a latent-augmented z-MDP with proven Q-function consistency: optimizing the local region is the same problem as the original MDP.
๐ท Provable โ a conditional monotonic-improvement guarantee, SAC-style.
๐ท Scales โ MuJoCo โ DMC Dog โ MyoSuite at low GPU cost, robust even at N = 2 importance samples.
๐ Website: https://t.co/HIbvXQNYL0
๐ arXiv: https://t.co/dRkVzzoBuB
๐ป Code: https://t.co/l4JQFlhdj5
Fifteen years ago, I had the privilege of working with Rosen Diankov at CMU on his PhD thesis. The capstone was IKFast โ for a generation of roboticists, the definition of what analytical inverse kinematics could be.
Today, I'm excited to release the next chapter: ssik.
1/
Shout out to a new pre-trained vision model for robotics that comes close to and outperforms prev works from our group - R3M (w @SurajNair_1), VIP(w @JasonMa2020), VC1 (w @aravindr93), etc.
Are you still running your robot policies on vision encoders trained purely on static images?
Nowadays, the standard practice in robot learning is to plug in powerful vision models like CLIP, SigLIP, or DINOv2. This inherits a quiet, convenient assumption: โLet mainstream computer vision handle perception, and the downstream policy will figure out the dynamics.โ
But letโs be real for a moment. Is this truly the best we can do?
We introduce DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation.โฌ๏ธ
๐ท Dynamics upstream: we push motion understanding into perception.
๐ท Tri-modal-dynamics supervision: image transitions ร language ร 3D flow, fused via simplex-volume alignment (260K trajectories from robot & human video)
๐ท Transfers everywhere: a visual backbone for diverse policies (MLP, Diffusion Policy, VLA)
๐ท +22.5% over the strongest baseline (DINOv2, SigLIP) under real-world OOD
๐ท Open-Source & easy to use
๐ Website: https://t.co/I3uKpAZ975
๐ Paper: https://t.co/jHAweJBreK
๐ป Code: https://t.co/yUueJ1xxJL
๐ค Hugging Face: https://t.co/jqLzJFHvMI
Main criticism with simulations are their difficulty+gap in capturing real world diversity. Progress in Generative simulations speeding up fast ๐ช
The focus needs to move from photorealism to physics & forces -- the language of physical world.
We are back again :) After three weeks of quiet building.
Introducing Genesis World 1.0, our latest simulation platform, the second release in our full-stack suite. Open-sourced.
Robotics is still bottlenecked by the 1ร speed of the physical world. Every model, checkpoint, and data recipe eventually needs to be tested on physical hardware, slowly, expensively, and with limited coverage.
One hour in reality can become 100 days in simulation. That is how robotics model iteration moves from a wall-clock bottleneck to a compute problem.
To make this work, simulation has to be both fast and trustworthy.
Over the past year, we rebuilt the entire stack: a GPU-accelerated cross-platform compiler, penetration-free multi-physics contact solvers, unified rigid and deformable physics, and a photo-realistic renderer purpose-built for physical AI applications.
We built Nyx, a high-performance path-traced rendering engine for robotics application.
Genesis World 1.0 achieves near realtime performance with our latest development for penetration-free IPC solver, supporting various types of deformables beyond rigid bodies. It supports contact-rich, dexterous manipulation simulation across different embodiments: unitree, sharpa, wuji, genesis hand and various types of grippers.
Under the hood is Quadrants, our effort in pushing forward cross-platform GPU-accelerated computation. Quadrants started as a fork of Taichi, and we rebuilt most of the critical parts for optimizing simulation workloads, giving 10x faster launch time and up to 4.6x runtime performance compared to the initial Genesis release.
Together, they bring us to an unprecedentedly low sim-to-real gap, enabling zero-shot real-to-sim model evaluation and much faster iteration of GENE.
All available today.
Genesis World 1.0: https://t.co/aknCM3eqws
Quadrants: https://t.co/uXqPNI4cb6
Nyx: https://t.co/R8j0djqGnV
Today, Human Archive is announcing our $8.2M seed round to model human embodied intelligence.
Despite decades of research, we still barely understand ourselves. Our goal is to learn how humans interact with the world, and over the past 6 months, our teamโs made enormous progress toward that alongside leading AI labs.
learn more @TechCrunch
https://t.co/faLhyVBjl1
This is evoBOT, a robot helper developed by Germanyโs Fraunhofer Institute for Material Flow and Logistics.
It can grasp and carry goods to support cargo workers in transporting packages.
evoBOT can also move smoothly across uneven terrain, including bumpy surfaces and sloping ground.
Spent last week benchmarking policy speedup methods. Then we just collected faster data and it beat all baselines...
Although obvious, but turns out first step to speed up your policy is โฆ collect faster data.
@abhishekunique7 The observation distribution is quite different between sim and real. Iโm curious why itโs reasonable to freeze the encoder, reward, and value function on the prior distribution?
Arenโt we introducing inconsistencies?
@wenlong_huang And keeping the dynamics and policy distribution well aligned with each other.
(Good dynamics model where policy is exploring states unhinged is disaster)
@macdonaldncode Imagine getting an intern and not being able to talk to him/her, it will become very hard to train them.
Humans have always aligned with the changing workflows. This is our superpower. And this time will be no different.
text2motion often struggles from physically inconsistent motion. However, the whole body controllers for G1 has gotten so robust that itโs finally becoming possible to connect the two.
Next frontier - task driven contextual motion generation & real time execution - can provide a great interface for training robots on the job.
Voiceโdriven, realโtime arbitrary action generation๐
Using external voice commands, G1 is directly controlled to generate a wide range of actions in real time.
This video was recorded in a single take, with onโsite audio recording.
Because the actions are autonomously generated by AI in real time, there may be slight latency, and the smoothness of the movements may be somewhat reduced.
SCALING ISNโT EVERYTHING
Another tiny model breaking the rule.
-trained on less than 1/1000th of the data
- can be trained in a single day with <1000 USD
Human knowledge base ca be compressed & retrieved much tighter than LLMs do today.
Whole body controllers - effective with contact rich behaviors - are the unsung HEROs of robotics๐ฆธโโ๏ธ
Without them all we will have - is a bunch of over powerful pincers picking & placing tiny objects on the table.
(a bit harsh but true)
You canโt lift a fridge with just your hands. Your whole body needs to conform to its shape, and bear the load between your arms and torso.
Here, @BostonDynamics' Atlas uses proprioception to manage the whole-body interaction and adapt to a shifting 100+ lb load. Enabling this type of high performance manipulation is exactly why we walked away from what was arguably the worldโs best implementation of MPC for humanoids, and shifted entirely to RL without looking back.
This level of whole-body controls is a fundamental building block of physical intelligence and key to the value proposition of humanoids.
More technical details in:
Blog: https://t.co/oIRjVfh7jJ
Behind the scenes video: https://t.co/LgaImMAyhX