Humanoid robotics is hitting a data wall. Teleop and mocap took us far, but they don’t scale to every object, terrain, and behavior.
We’re releasing GRAIL: https://t.co/LxTKtMPtw0 — a fully digital pipeline for generating loco-manipulation data before the robot moves. 🧵(1/8)
How do you teach a humanoid to assist another person in close-contact? 🤖
The hard part: the two bodies are physically coupled — helper & helped continuously shape each other's motion.
Neither can be solved alone.
Meet AssistMimic, our multi-agent RL framework👇🧵 #CVPR2026
Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents.
Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold.
Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.
What is missing to bring real-time motion research into AAA games and real-world robotics?
We present MotionBricks, a step toward bridging this gap with two key components:
- a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability.
- a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors.
Webpage: https://t.co/aJE5skUuWD
Code: https://t.co/r56D3TJ8CW
Paper: https://t.co/CtOHXnHZMv (ACM TOG / SIGGRAPH 2026)
Can we build a standalone, modular, and reusable naturalness reward for training motor controllers?
#SMP is a step toward that vision. Once SMP has been trained on a motion dataset, the priors can be reused to train new controllers to perform diverse tasks while adhering to the behaviors in the dataset, without original dataset or retraining.
🔥 Excited to share our latest work, SMP: Score-Matching Motion Priors, accepted to @siggraph
Webpage: https://t.co/Pz4yFAg1wo
Code: https://t.co/rZPp5b5GPD
Paper: https://t.co/K0z1oQkdFZ
Video: https://t.co/gPkyQCqNWz
I’m so tired of writing rebuttals to this kind of “lack of novelty” review: “This paper trivially combines A, B, and C, so the algorithmic novelty is limited.”
Technically, most (if not all) robotics papers are convex combinations of existing ideas.
I still deeply appreciate A+B+C papers—especially when they deliver:
- New capabilities: the “trivial combination” unlocks behaviors we simply couldn’t achieve before
- Sensible & organic design: A+B+C is clearly the right composition—not some arbitrary A′+B+C′
- Nontrivial interactions: careful analysis of the dynamics, coupling, or failure modes between A, B, C
- Rehabilitating old ideas: A was dismissed for years, but paired with modern B/C, it suddenly works—and teaches us why
- System-level & "interface" insight: the contribution is not any single piece, but how the pieces talk to each other
- Scaling laws or regimes: identifying when/why A+B+C works (and when it doesn’t)
- Engineering clarity: making something actually work robustly in the real world is not “trivial”
- New problem formulations: sometimes the real novelty is in the reformulation—only under this view does A+B+C make sense.
Maybe worth keeping these in mind when reviewing the next A+B+C paper : )
Introducing Ψ₀ (https://t.co/qqH1PiIJS8) — an open foundation model for universal humanoid loco-manipulation.
🏆 Outperforms GR00T N1.6 by 40%+ overall success rate
📉 Uses only ~10% of the pre-training data
📦 Fully open-source: model, data, code, and deployment pipeline
1/10
Need high-quality motion for humanoid robots or digital humans?
Meet Kimodo: our new diffusion model trained on 700 hours of optical mocap data for easy, controllable, and high-fidelity motion generation. @NVIDIAAI
https://t.co/aVbQB0WQLA
A nice little quality-of-life update, MimicKit now supports video logging. You can monitor the agent's behaviors during training on WandB and Tensorboard:
https://t.co/6imXPPEwXl
We also added an implementation of Lipschitz-Constrained Policies for training smooth controllers.
Peeling a potato is trivial -- until you try to make a robot do it with a knife.
This is actually one of the hardest problems in manipulation: contact-rich, force-sensitive, and success is subjective.
We taught a robot arm to peel with >90% success :D
https://t.co/h8tzCAt6Mo
Real-world loco-manipulation demands more than replaying fixed reference motions.
We argue that true autonomy requires two capabilities:
1️⃣ flexibly leveraging whatever signals are available — dense references, partial cues, state estimates, or egocentric perception
2️⃣ remaining capable when any of these signals are missing or unreliable
We introduce ULTRA — an all-in-one controller for unified humanoid loco-manipulation 🤖
It supports:
• general reference tracking
• sparse goal following
• execution with motion capture
• execution with egocentric perception
🔗 Project page:
https://t.co/Ce9RHvryPC
Unitree Spring Festival Gala Robots —a Full Release of Additional Details 🥳
Dozens of G1 robots achieved the world’s first fully autonomous humanoid robot cluster Kung Fu performance (with quick movement), pushing motion limits and setting multiple world firsts! H2 made striking appearances at both the Beijing main venue and the Yiwu sub-venue, clad in the Monkey King’s heavy armor and riding a “somersault cloud” played by B2W quadruped robot dogs, delivering New Year blessings from the clouds.
🚀 Introducing CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation!
Current humanoids face a trade-off: they are either Agile & Stiff OR Slow & Soft.
CHIP breaks this barrier. We enable on-the-fly switching between Compliant (wiping 🧼, collaborative holding 📦) and Stiff (lifting dumbbells 🏋️, opening doors 🚪💪) behaviors—all while maintaining agile skills like running! 🏃💨
Website: https://t.co/6itMzJz7ct
Join me for a deep dive on how CHIP enables adaptive control for complex tasks. 🧵↓
Can we bridge the Sim-to-Real gap in complex manipulation without explicit system ID? 🤖
Presenting Contact-Aware Neural Dynamics — a diffusion-based framework that grounds simulation with real-world touch.
Implicit Alignment: No tedious parameter tuning.
Tactile-Driven: Captures non-smooth contact events.
Consistent: Stable predictions in contact-rich tasks.
Introduce HumanX, a full-stack framework that compiles human video into generalizable, real-world interaction skills 🏀⚽️🥊📦 for humanoids, without task-specific rewards.
Paper: https://t.co/TJR620zPGE
Page: https://t.co/zbwrHbN3Na
#humanoid#ai#hkust#robotics#sports
Today we're introducing Helix 02
Dancing robots are trivial, the hard part is intelligent control
This is our most powerful model to date - able to work across complex tasks & long time horizons
https://t.co/cExYWUUoDp
How far can we push the limit of in-hand manipulation dexterity?
Introducing our work on motion capture: DexterCap & DexterHand !
DexterCap: A high-fidelity motion capture system for intricate in-hand manipulation motion.
DexterHand: A dataset featuring true in-hand dexterity, reorientation, finger gaiting, and even manipulating a Rubik's Cube like a speedcuber ! 🧩
- Project Page: https://t.co/9EASACaY6p
- Experience it via our online interactive visualization: https://t.co/IHCkbHBuuq
#Animation #CharacterAnimation #MotionCapture #Graphics #EmbodiedAI #DexterousManipulation
At @nvidia, we built ProtoMotions to help us, and researchers world-wide, innovate quickly without compromising on applicability.
We're proud to announce ProtoMotions3 -- our biggest release yet!
🧵👇