Harsh

Verified account

@HSlifelearner

building agents and poses @nvidia | Robograd @CMU_Robotics | Investing in startups

San Francisco, CA

Joined June 2009

2.8K Following

959 Followers

1.1K Posts

Pinned Tweet

2 months ago

Check the full suite for full mocap to robotics pretraining . SOMA has anatomically correct joint definitions and has much detailed mesh key points compared to MHR/SMPL. Foundational for all bodypose downstream tasks. More on this soon on its capabilities.

3 months ago

#NVIDIA just released a whole ecosystem for human(oid) motion and robot learning from human data. 🚀🦾 Data, as we all know, is the key to scaling AI models. To accelerate the field of Embodied AI, we have open-sourced a full stack of models and tools to capture, generate, retarget, and simulate human(oid) motion data at scale, along with a massive high-quality dataset and a standard human skeletal representation, SOMA, to make them all seamlessly communicate with each other. The entire suite is available under the Apache 2.0 license. 1️⃣ SOMA: A universal interface to unify all parametric human body models (SOMA-shape, SMPL, MHR, etc.) into a standard skeletal representation, eliminating the need for custom adapters or model-specific retargeting. 🔗 https://t.co/Xrg672T7Nu 2️⃣ Kimodo: High-fidelity, controllable text-to-motion generation for both humans and humanoid robots. 🔗 https://t.co/2cQKAPfvEU 3️⃣ GEM: A global human pose estimation method from in-the-wild videos, natively compatible with SOMA. 🔗 https://t.co/pV0043jwcO 4️⃣ Bones-SEED: A massive dataset of 150k+ motions in SOMA format, including data already retargeted for the Unitree G1, created with our partners at Bones Studio. 🔗 https://t.co/wxfyZ7S9TJ 🔗 https://t.co/oM5rIMdRi8 5️⃣ SOMA Retargeter: A dedicated tool for seamless motion retargeting from the SOMA skeleton to the Unitree G1. 🔗 https://t.co/jg4DUjWcnw 6️⃣ ProtoMotions: Our high-performance simulation framework for training digital human(oid)s via RL, now with native SOMA support. 🔗 https://t.co/K1zsGEdl5S This is just the beginning, and we have much more in the pipeline. Excited to see what the community builds next! #NVIDIA #GTC #GTC2026 #Robotics #EmbodiedAI #PhysicalAI @NVIDIAAI

5

422

80

347

47K

1

8

1

0

931

about 11 hours ago

@olsenbdnr @xai who reviews them

0

0

0

0

37

about 12 hours ago

@yinghui_he_ Hey welcome @yinghui_he_ .

1

1

0

0

254

HSlifelearner retweeted

Min-Hung (Steve) Chen

1 day ago

🚀 4D-RGPT is a #CVPR2026 Highlight from @NVIDIA! 🌌 Amid #Cosmos3 + #PhysicalAI momentum, we tackle: 🎥 region-level 4D video understanding 🎯 regions + 📏 depth + 🌀 motion + ⏱️ time 🖼️ Main poster + 5 workshops in Denver 📍Jun 7, 11:45–1:45, ExHall F #225 📦 Code, Model weights & R4D-Bench are out 👇 @CVPR @NVIDIAAI

2

53

4

13

3K

Who to follow

Verified account

Robotics Research Scientist at The AI Institute | @CMU_Robotics PhD

@__epiception__

Applied Scientist @amazon | prev. at @CMU_Robotics @iiit_hyderabad @swaayatt Life before Death, Strength before Weakness, Journey before Destination

Swaminathan Gurumurthy

@SwaminathanGur3

PhD student at the Robotics Institute, CMU

2 days ago

@Yuchenj_UW @bcherny Damn super jealous

0

1

0

0

105

HSlifelearner retweeted

3 days ago

From a young age, I have always wanted to be the exit liquidity for shareholders of artificial intelligence companies

121

19K

2K

721

462K

HSlifelearner retweeted

2 days ago

buying into the anthropic IPO at $1T valuation would obviously be an incredible deal, 22x multiple on ARR, huge room to grow, countless markets untapped, mythos as of yet unmonetized. kind of thing people dump whole retirement portfolios into. which is why it'll be $3T

33

1K

10

143

127K

HSlifelearner retweeted

Xuning Yang @xuningy

3 days ago

🎉 We added 2 SOTA WAMs to the RoboLab Leaderboard 🎉 Current leaders on RoboLab-120 (specific instr.): 🥇Cosmos3-Nano-Policy (39.7%) 🥈π0.5 (28.1%) 🥉DreamZero (28.1%) → See full results at: https://t.co/Le8jykn5jo → All policy clients available at: https://t.co/wQH4Py6zJ8

xuningy's tweet photo. 🎉 We added 2 SOTA WAMs to the RoboLab Leaderboard 🎉

Current leaders on RoboLab-120 (specific instr.):
🥇Cosmos3-Nano-Policy (39.7%)
🥈π0.5 (28.1%)
🥉DreamZero (28.1%)

→ See full results at: https://t.co/Le8jykn5jo

→ All policy clients available at: https://t.co/wQH4Py6zJ8 https://t.co/PMg9l74zBU

6

127

21

65

29K

HSlifelearner retweeted

Ashkan Mirzaei @ashmrz10

3 days ago

I’m excited to share what our team has been building at @NVIDIAAI since I joined: Cosmos 3, an omnimodal world model for Physical AI. Project: https://t.co/HTCR8JSzdW HF: https://t.co/19p3c6pfZ0 Code: https://t.co/G6fuUOWFNk

4

158

17

53

12K

HSlifelearner retweeted

2 days ago

Introducing NVIDIA Cosmos 3 We released NVIDIA Cosmos 3 last night. And today, seeing it take the top spots across 8+ open model leaderboards feels surreal. We spent months working towards this moment. Here’s the breakdown: The Leaderboard Wins World Reasoning 🏆 #1 open model on VANTAGE-Bench for vision AI 🏆 #1 overall on Traffic Anomaly Reasoning (TAR) World Generation 🏆 #1 open model on Artificial Analysis Image-to-Video leaderboard 🏆 #1 open model on Artificial Analysis Text-to-Image leaderboard 🏆 #1 open model on PAI-Bench for physical AI synthetic data generation 🏆 #1 open model on Physics-IQ, which measures accuracy on physical laws 🏆 #1 open model on R-Bench for world generation quality World Action 🏆 #1 on RoboArena for specialized policy 🏆 #1 on RoboLab for action generation But the leaderboards are only part of the story. The real story is why we built Cosmos 3 in the first place. The Problem Training robots and autonomous systems in the real world is painfully hard. Robots need to try the same thing numerous times before they succeed reliably. Self-driving cars need rare edge cases that may never happen naturally. Smart machines need to understand physics, motion, contact, failure, and surprise. And real-world data is slow, expensive, and sometimes dangerous to collect. At some point, the answer cannot just be “collect more data.” You can’t collect your way out of an infinite physical world. You have to generate it. That… was the question behind Cosmos: Can one model understand the physical world deeply enough to reason about it, simulate it, and generate actions inside it? What We Built Cosmos 3 is the first omni-model for physical AI. It can understand and generate across: language · images · video · audio · action sequences It is not just a VLM. Not just a video generator. Not just a robot policy model. It is all of them, in one single model. That matters because physical AI has been fragmented for a long time. Cosmos 3 is our attempt to collapse that fragmentation. Depending on how you configure the inputs and outputs, the same model can act as a vision-language model, a video/world generator, a world simulator, or a world-action model. No separate architecture required. The Architecture Under the hood, Cosmos 3 uses a dual-tower Mixture-of-Transformers architecture. One tower is autoregressive for reasoning. It handles next-token prediction for language and discrete understanding. The other tower is diffusion-based- for generation. It denoises images, video, audio, and action trajectories. Two towers. Dual-stream joint attention. One shared world representation. Each modality gets its own tools: visual encoders, video VAEs, audio VAEs, and action projectors that can map different embodiments into a unified action space. Action is a first-class modality in Cosmos 3. That’s what makes it more than a video model. It doesn’t just predict and generate what the world might look like. It can connect reasoning and world modeling to physically grounded action. Why This Matters One of the most interesting findings from the ablation work is that training action domains together creates positive transfer. That means adding more embodiments does not just add more use cases. It can actually make the model better. This is the heart of why omnimodal training matters. A shared world representation is not just convenient. It can make each individual task stronger. That’s the part that feels like the beginning of something much bigger. The part I’m most excited about is that Cosmos 3 is fully open. Developers get the models, scripts, optimization, inference endpoints, post-training recipes, datasets, and benchmarks. Everything is available under the Linux Foundation’s OpenMDW 1.1 License. You can use Cosmos 3 out of the box. You can use the VLM, world model, or world-action pieces separately. You can post-train it for your own domain, embodiment, or accuracy target. That’s what makes this feel different. Cosmos 3 is not just a model release. It is the foundation for building intelligence for autonomous machines. For me, Cosmos 3 feels like a step toward a world where physical AI development becomes much more scalable and accessible - to a new age of developers and agents. That’s what we built Cosmos 3 for. I cannot wait to see what you build with it. Download Models on Hugging Face https://t.co/LAZoVygeim Customize Models on GitHub https://t.co/ZVQBNdqXDD Read the Tech Blog to Learn More https://t.co/Hn6Op9YeG1

liu_mingyu's tweet photo. Introducing NVIDIA Cosmos 3

We released NVIDIA Cosmos 3 last night.

And today, seeing it take the top spots across 8+ open model leaderboards feels surreal. We spent months working towards this moment.

Here’s the breakdown:

The Leaderboard Wins

World Reasoning
🏆 #1 open model on VANTAGE-Bench for vision AI
🏆 #1 overall on Traffic Anomaly Reasoning (TAR)

World Generation
🏆 #1 open model on Artificial Analysis Image-to-Video leaderboard
🏆 #1 open model on Artificial Analysis Text-to-Image leaderboard
🏆 #1 open model on PAI-Bench for physical AI synthetic data generation
🏆 #1 open model on Physics-IQ, which measures accuracy on physical laws
🏆 #1 open model on R-Bench for world generation quality

World Action
🏆 #1 on RoboArena for specialized policy
🏆 #1 on RoboLab for action generation

But the leaderboards are only part of the story. The real story is why we built Cosmos 3 in the first place.

The Problem

Training robots and autonomous systems in the real world is painfully hard.

Robots need to try the same thing numerous times before they succeed reliably. Self-driving cars need rare edge cases that may never happen naturally. Smart machines need to understand physics, motion, contact, failure, and surprise.

And real-world data is slow, expensive, and sometimes dangerous to collect. At some point, the answer cannot just be “collect more data.”

You can’t collect your way out of an infinite physical world. You have to generate it.

That… was the question behind Cosmos: Can one model understand the physical world deeply enough to reason about it, simulate it, and generate actions inside it?

What We Built

Cosmos 3 is the first omni-model for physical AI. It can understand and generate across: language · images · video · audio · action sequences

It is not just a VLM.

Not just a video generator.

Not just a robot policy model.

It is all of them, in one single model.

That matters because physical AI has been fragmented for a long time. Cosmos 3 is our attempt to collapse that fragmentation.

Depending on how you configure the inputs and outputs, the same model can act as a vision-language model, a video/world generator, a world simulator, or a world-action model.

No separate architecture required.

The Architecture

Under the hood, Cosmos 3 uses a dual-tower Mixture-of-Transformers architecture.

One tower is autoregressive for reasoning. It handles next-token prediction for language and discrete understanding.

The other tower is diffusion-based- for generation. It denoises images, video, audio, and action trajectories.
Two towers. Dual-stream joint attention. One shared world representation.

Each modality gets its own tools: visual encoders, video VAEs, audio VAEs, and action projectors that can map different embodiments into a unified action space.

Action is a first-class modality in Cosmos 3.

That’s what makes it more than a video model. It doesn’t just predict and generate what the world might look like. It can connect reasoning and world modeling to physically grounded action.

Why This Matters

One of the most interesting findings from the ablation work is that training action domains together creates positive transfer.

That means adding more embodiments does not just add more use cases. It can actually make the model better.

This is the heart of why omnimodal training matters.

A shared world representation is not just convenient. It can make each individual task stronger. That’s the part that feels like the beginning of something much bigger.

The part I’m most excited about is that Cosmos 3 is fully open.

Developers get the models, scripts, optimization, inference endpoints, post-training recipes, datasets, and benchmarks.

Everything is available under the Linux Foundation’s OpenMDW 1.1 License.

You can use Cosmos 3 out of the box. You can use the VLM, world model, or world-action pieces separately.

You can post-train it for your own domain, embodiment, or accuracy target.

That’s what makes this feel different.

Cosmos 3 is not just a model release. It is the foundation for building intelligence for autonomous machines.

For me, Cosmos 3 feels like a step toward a world where physical AI development becomes much more scalable and accessible - to a new age of developers and agents.

That’s what we built Cosmos 3 for. I cannot wait to see what you build with it.

Download Models on Hugging Face
https://t.co/LAZoVygeim

Customize Models on GitHub
https://t.co/ZVQBNdqXDD

Read the Tech Blog to Learn More
https://t.co/Hn6Op9YeG1

20

450

68

198

62K

HSlifelearner retweeted

3 days ago

It all starts with the @NVIDIARTXSpark Superchip. RTX Spark reinvents the personal computer for agents, creating and gaming. Learn more → https://t.co/AD9xcE63ww

44

1K

136

151

242K

HSlifelearner retweeted

3 days ago

Cosmos 3 is a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. It has incredible capabilities and is ranked as the number one open-source Text2Image and Image2Video model by Artificial Analysis, and as the number one robot policy model by RoboLab and RoboArena. Try it out. model: https://t.co/LAZoVygeim code: https://t.co/ZVQBNdqXDD website: https://t.co/lC9KfkAWcj paper: https://t.co/mUgQ8gqnCb

10

201

37

53

23K

HSlifelearner retweeted

3 days ago

please don't take the advice that you should stay at a company long and "not hop around" for your first jobs it's absolutely braindead to decide on a long term bet with zero datapoints on what a good team looks like and long before you have priced yourself into the market

42

2K

46

375

125K

HSlifelearner retweeted

4 days ago

Robotics is still data starved. Collecting high-quality robot demonstrations remains brutally slow and expensive. Introducing COBALT: A cloud-native teleoperation platform designed for large-scale robot learning. We are democratizing data collection by leveraging the hardware everyone already owns: the smartphone All you need is to download an app (today)! Read on for more!

23

342

42

255

53K

4 days ago

Credit where is due rfdetr is a really good ppl detection model and commercially available out there.

0

2

0

0

31

HSlifelearner retweeted

@trajectorylabs

5 days ago

🏹5 Days of Trajectory. Day 3 - An Open Source Training Stack for Continual Learning Building the platform for continual learning requires both partnering with pioneering AI companies, as we showed on Day 2 with Harvey, and working toward frontier research, which we are highlighting today. Continual learning means models that improve hourly from real production use. But with the size of frontier models, this becomes quite difficult. A Qwen-397b would need to spin up and tear down repeatedly across six GPU nodes, and that's valuable time gone. Our contribution is Continual LoRA (C-LoRA): many lightweight adapters running at once on one shared base model. Our insight centers on where the parallelism lives: instead of splitting one giant job across nodes, we load-balance many small jobs over a single base. The result: 2.81x experiment throughput over single-tenant training, with no regression on rewards. We built this together, with @anyscalecompute, @NovaSkyAI, and generous support from @GoogleCloud and @GoogleStartups. We've open-sourced on SkyRL as one of the first multi-LoRA, RL training platforms, so that every team can get to continual learning faster. We’re very excited to see what you build, please reach out!

trajectorylabs's tweet photo. 🏹5 Days of Trajectory.

Day 3 - An Open Source Training Stack for Continual Learning

Building the platform for continual learning requires both partnering with pioneering AI companies, as we showed on Day 2 with Harvey, and working toward frontier research, which we are highlighting today.

Continual learning means models that improve hourly from real production use. But with the size of frontier models, this becomes quite difficult. A Qwen-397b would need to spin up and tear down repeatedly across six GPU nodes, and that's valuable time gone.

Our contribution is Continual LoRA (C-LoRA): many lightweight adapters running at once on one shared base model. Our insight centers on where the parallelism lives: instead of splitting one giant job across nodes, we load-balance many small jobs over a single base.

The result: 2.81x experiment throughput over single-tenant training, with no regression on rewards.

We built this together, with @anyscalecompute, @NovaSkyAI, and generous support from @GoogleCloud and @GoogleStartups. We've open-sourced on SkyRL as one of the first multi-LoRA, RL training platforms, so that every team can get to continual learning faster.

We’re very excited to see what you build, please reach out!

11

512

61

394

92K

HSlifelearner retweeted

6 days ago

today @CS153Systems, the students got to hear from @LiamFedus and @ekindogus about their search for a room temperature superconductor at @periodiclabs the kids will remember this one for the rest of their lives

AnjneyMidha's tweet photo. today @CS153Systems, the students got to hear from @LiamFedus and @ekindogus about their search for a room temperature superconductor at @periodiclabs

the kids will remember this one for the rest of their lives https://t.co/ccFHNtj4b7

5

224

10

60

17K

HSlifelearner retweeted

6 days ago

We’ve just released the #Alpamayo Chain-of-Causation (CoC) Autolabeling Pipeline — a feature that has been highly requested by the community! The pipeline automatically derives: 🔹 Meta-actions: high-level categorical descriptions of ego motion 🔹 Chain-of-causation labels: causal links between scene factors and the ego vehicle’s intended behavior Autolabeling pipeline: https://t.co/2mrnj47WzK Learn more about the Alpamayo open platform: https://t.co/P0nuqkwBab We’re excited to see what the community builds with it, and we hope this tool will help accelerate research in the rapidly growing area of #reasoning models for #Physical #AI. @NVIDIADRIVE @NVIDIAAI

1

52

15

21

4K

HSlifelearner retweeted

@chris_j_paxton

7 days ago

Get your chores done for free if youre okay with the data being used to train robots

25

924

32

235

397K

HSlifelearner retweeted

6 days ago

We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude.

1K

22K

2K

2K

8M

7 days ago

@benitoz @NaderLikeLadder Agreed always cooking @KranenKyle and @NaderLikeLadder

0

2

0

0

108

Last Seen Users on Sotwe

Trends for you

Most Popular Users