Jing Zhang

@vinjn

Building AI Infra in NVIDIA.

California

Joined April 2009

1.6K Following

606 Followers

1.6K Posts

vinjn retweeted

Elon Musk

@elonmusk

15 days ago

Tesla FSD 14.3.4 rolling out now

77K

10M

Jing Zhang @vinjn

18 days ago

@m0d8ye PPA...?

580

Jing Zhang @vinjn

25 days ago

It is not just a VLM. Not just a video generator. Not just a robot policy model. It is all of them, in one single model.

Ming-Yu Liu

@liu_mingyu

25 days ago

Introducing NVIDIA Cosmos 3 We released NVIDIA Cosmos 3 last night. And today, seeing it take the top spots across 8+ open model leaderboards feels surreal. We spent months working towards this moment. Here’s the breakdown: The Leaderboard Wins World Reasoning 🏆 #1 open model on VANTAGE-Bench for vision AI 🏆 #1 overall on Traffic Anomaly Reasoning (TAR) World Generation 🏆 #1 open model on Artificial Analysis Image-to-Video leaderboard 🏆 #1 open model on Artificial Analysis Text-to-Image leaderboard 🏆 #1 open model on PAI-Bench for physical AI synthetic data generation 🏆 #1 open model on Physics-IQ, which measures accuracy on physical laws 🏆 #1 open model on R-Bench for world generation quality World Action 🏆 #1 on RoboArena for specialized policy 🏆 #1 on RoboLab for action generation But the leaderboards are only part of the story. The real story is why we built Cosmos 3 in the first place. The Problem Training robots and autonomous systems in the real world is painfully hard. Robots need to try the same thing numerous times before they succeed reliably. Self-driving cars need rare edge cases that may never happen naturally. Smart machines need to understand physics, motion, contact, failure, and surprise. And real-world data is slow, expensive, and sometimes dangerous to collect. At some point, the answer cannot just be “collect more data.” You can’t collect your way out of an infinite physical world. You have to generate it. That… was the question behind Cosmos: Can one model understand the physical world deeply enough to reason about it, simulate it, and generate actions inside it? What We Built Cosmos 3 is the first omni-model for physical AI. It can understand and generate across: language · images · video · audio · action sequences It is not just a VLM. Not just a video generator. Not just a robot policy model. It is all of them, in one single model. That matters because physical AI has been fragmented for a long time. Cosmos 3 is our attempt to collapse that fragmentation. Depending on how you configure the inputs and outputs, the same model can act as a vision-language model, a video/world generator, a world simulator, or a world-action model. No separate architecture required. The Architecture Under the hood, Cosmos 3 uses a dual-tower Mixture-of-Transformers architecture. One tower is autoregressive for reasoning. It handles next-token prediction for language and discrete understanding. The other tower is diffusion-based- for generation. It denoises images, video, audio, and action trajectories. Two towers. Dual-stream joint attention. One shared world representation. Each modality gets its own tools: visual encoders, video VAEs, audio VAEs, and action projectors that can map different embodiments into a unified action space. Action is a first-class modality in Cosmos 3. That’s what makes it more than a video model. It doesn’t just predict and generate what the world might look like. It can connect reasoning and world modeling to physically grounded action. Why This Matters One of the most interesting findings from the ablation work is that training action domains together creates positive transfer. That means adding more embodiments does not just add more use cases. It can actually make the model better. This is the heart of why omnimodal training matters. A shared world representation is not just convenient. It can make each individual task stronger. That’s the part that feels like the beginning of something much bigger. The part I’m most excited about is that Cosmos 3 is fully open. Developers get the models, scripts, optimization, inference endpoints, post-training recipes, datasets, and benchmarks. Everything is available under the Linux Foundation’s OpenMDW 1.1 License. You can use Cosmos 3 out of the box. You can use the VLM, world model, or world-action pieces separately. You can post-train it for your own domain, embodiment, or accuracy target. That’s what makes this feel different. Cosmos 3 is not just a model release. It is the foundation for building intelligence for autonomous machines. For me, Cosmos 3 feels like a step toward a world where physical AI development becomes much more scalable and accessible - to a new age of developers and agents. That’s what we built Cosmos 3 for. I cannot wait to see what you build with it. Download Models on Hugging Face https://t.co/LAZoVygeim Customize Models on GitHub https://t.co/ZVQBNdqXDD Read the Tech Blog to Learn More https://t.co/Hn6Op9YeG1

liu_mingyu's tweet photo. Introducing NVIDIA Cosmos 3

We released NVIDIA Cosmos 3 last night.

And today, seeing it take the top spots across 8+ open model leaderboards feels surreal. We spent months working towards this moment.

Here’s the breakdown:

The Leaderboard Wins

World Reasoning
🏆 #1 open model on VANTAGE-Bench for vision AI
🏆 #1 overall on Traffic Anomaly Reasoning (TAR)

World Generation
🏆 #1 open model on Artificial Analysis Image-to-Video leaderboard
🏆 #1 open model on Artificial Analysis Text-to-Image leaderboard
🏆 #1 open model on PAI-Bench for physical AI synthetic data generation
🏆 #1 open model on Physics-IQ, which measures accuracy on physical laws
🏆 #1 open model on R-Bench for world generation quality

World Action
🏆 #1 on RoboArena for specialized policy
🏆 #1 on RoboLab for action generation

But the leaderboards are only part of the story. The real story is why we built Cosmos 3 in the first place.

The Problem

Training robots and autonomous systems in the real world is painfully hard.

Robots need to try the same thing numerous times before they succeed reliably. Self-driving cars need rare edge cases that may never happen naturally. Smart machines need to understand physics, motion, contact, failure, and surprise.

And real-world data is slow, expensive, and sometimes dangerous to collect. At some point, the answer cannot just be “collect more data.”

You can’t collect your way out of an infinite physical world. You have to generate it.

That… was the question behind Cosmos: Can one model understand the physical world deeply enough to reason about it, simulate it, and generate actions inside it?

What We Built

Cosmos 3 is the first omni-model for physical AI. It can understand and generate across: language · images · video · audio · action sequences

It is not just a VLM.

Not just a video generator.

Not just a robot policy model.

It is all of them, in one single model.

That matters because physical AI has been fragmented for a long time. Cosmos 3 is our attempt to collapse that fragmentation.

Depending on how you configure the inputs and outputs, the same model can act as a vision-language model, a video/world generator, a world simulator, or a world-action model.

No separate architecture required.

The Architecture

Under the hood, Cosmos 3 uses a dual-tower Mixture-of-Transformers architecture.

One tower is autoregressive for reasoning. It handles next-token prediction for language and discrete understanding.

The other tower is diffusion-based- for generation. It denoises images, video, audio, and action trajectories.
Two towers. Dual-stream joint attention. One shared world representation.

Each modality gets its own tools: visual encoders, video VAEs, audio VAEs, and action projectors that can map different embodiments into a unified action space.

Action is a first-class modality in Cosmos 3.

That’s what makes it more than a video model. It doesn’t just predict and generate what the world might look like. It can connect reasoning and world modeling to physically grounded action.

Why This Matters

One of the most interesting findings from the ablation work is that training action domains together creates positive transfer.

That means adding more embodiments does not just add more use cases. It can actually make the model better.

This is the heart of why omnimodal training matters.

A shared world representation is not just convenient. It can make each individual task stronger. That’s the part that feels like the beginning of something much bigger.

The part I’m most excited about is that Cosmos 3 is fully open.

Developers get the models, scripts, optimization, inference endpoints, post-training recipes, datasets, and benchmarks.

Everything is available under the Linux Foundation’s OpenMDW 1.1 License.

You can use Cosmos 3 out of the box. You can use the VLM, world model, or world-action pieces separately.

You can post-train it for your own domain, embodiment, or accuracy target.

That’s what makes this feel different.

Cosmos 3 is not just a model release. It is the foundation for building intelligence for autonomous machines.

For me, Cosmos 3 feels like a step toward a world where physical AI development becomes much more scalable and accessible - to a new age of developers and agents.

That’s what we built Cosmos 3 for. I cannot wait to see what you build with it.

Download Models on Hugging Face
https://t.co/LAZoVygeim

Customize Models on GitHub
https://t.co/ZVQBNdqXDD

Read the Tech Blog to Learn More
https://t.co/Hn6Op9YeG1

451

199

66K

Jing Zhang @vinjn

25 days ago

Proud to share the latest drop from NVIDIA Cosmos Lab - Cosmos 3.

Artificial Analysis

@ArtificialAnlys

25 days ago

NVIDIA's Cosmos 3 lands at #1 among open weights models in both Text to Image and Image to Video on the Artificial Analysis Leaderboards! Cosmos 3 is a family of omnimodal world models for Physical AI from @nvidia, unifying language, image, video, audio and action in a single Mixture-of-Transformers architecture that pairs an autoregressive reasoner with a diffusion generator. The family comes in four variants: base Nano (16B: 8B reasoner tower + 8B generator tower) and Super (64B: 32B reasoner tower + 32B generator tower) models, with the Super model also having Text2Image and Image2Video fine-tuned variants, which are the versions listed in the Artificial Analysis Arena Leaderboards. Cosmos3-Super-Text2Image (agentic) runs through an agentic prompt-upsampling harness, and takes the #1 open weights spot in Text to Image, surpassing HiDream-O1-Image-Dev-2604, Alibaba's Qwen Image Max 2512 and Black Forest Labs' FLUX.2 [dev]. Cosmos3-Super-Image2Video takes #1 open weights in Image to Video (No Audio), ahead of Lightricks' LTX-2, and Alibaba's Wan 2.2 A14B. Cosmos 3 generators take structured JSON prompts rather than plain text, so prompt upsampling is needed to reproduce these results. This upsampling can be handled by an external harness or by the model's own reasoner branch, so it can also run self-contained. Cosmos 3 is fully open under the OpenMDW 1.1 license, shipping with weights, code, curated datasets and fine-tuning recipes available on @huggingface. First-party and third-party APIs are expected over the next few weeks, with pricing to follow. See the thread below for example generations and a link to try Cosmos 3 in our arena 🧵

ArtificialAnlys's tweet photo. NVIDIA's Cosmos 3 lands at #1 among open weights models in both Text to Image and Image to Video on the Artificial Analysis Leaderboards!

Cosmos 3 is a family of omnimodal world models for Physical AI from @nvidia, unifying language, image, video, audio and action in a single Mixture-of-Transformers architecture that pairs an autoregressive reasoner with a diffusion generator.

The family comes in four variants: base Nano (16B: 8B reasoner tower + 8B generator tower) and Super (64B: 32B reasoner tower + 32B generator tower) models, with the Super model also having Text2Image and Image2Video fine-tuned variants, which are the versions listed in the Artificial Analysis Arena Leaderboards.

Cosmos3-Super-Text2Image (agentic) runs through an agentic prompt-upsampling harness, and takes the #1 open weights spot in Text to Image, surpassing HiDream-O1-Image-Dev-2604, Alibaba's Qwen Image Max 2512 and Black Forest Labs' FLUX.2 [dev].

Cosmos3-Super-Image2Video takes #1 open weights in Image to Video (No Audio), ahead of Lightricks' LTX-2, and Alibaba's Wan 2.2 A14B.

Cosmos 3 generators take structured JSON prompts rather than plain text, so prompt upsampling is needed to reproduce these results. This upsampling can be handled by an external harness or by the model's own reasoner branch, so it can also run self-contained.

Cosmos 3 is fully open under the OpenMDW 1.1 license, shipping with weights, code, curated datasets and fine-tuning recipes available on @huggingface. First-party and third-party APIs are expected over the next few weeks, with pricing to follow.

See the thread below for example generations and a link to try Cosmos 3 in our arena 🧵

335

39K

336

Who to follow

Adam Miles

@adamjmiles

Principal SDE @Microsoft Xbox Advanced Technology Group (ATG). Formerly Rendering Engineer @WeArePlayground

Nikos Papadopoulos

@4rknova

Dad, Stargazer, Software Engineer: Graphics, Full Stack Ex @NMGames @Samsung @BISimulations Opinions expressed are subject to quantum entanglement

Derrick Owens

@desmondsedici

Graphics engine coder veteran. I work on GPU firmware, drivers and tools at AMD. I use C/C++, x64, RISC-V and Verilog. F1, WRC & Aston Villa fan.

Jing Zhang @vinjn

26 days ago

@m0d8ye Indian neo cloud https://t.co/RC9ZIerr2B

Jing Zhang @vinjn

about 2 months ago

@m0d8ye https://t.co/EKWxEhfo0u

281

Jing Zhang @vinjn

about 2 months ago

@m0d8ye 火钳刘明

Jing Zhang @vinjn

about 2 months ago

@OpenAIDevs https://t.co/8VGmkaHgNA

Jing Zhang @vinjn

about 2 months ago

Sharing codex pet - DaoDun https://t.co/8VGmkaHgNA

127

vinjn retweeted

Qwen

@Alibaba_Qwen

4 months ago

Big thanks to Day 0 support! Developers can start building today for free on nim and finetune with NeMo recipe: https://t.co/NOR3OIteq8

100

12K

vinjn retweeted

internet archiva

@internetarchiva

5 months ago

Rare footage of Hans Zimmer composing the Interstellar score

606

vinjn retweeted

Dan Hollick

@DanHollick

7 months ago

If you've ever seen someone tweet some cool shader and thought "I don't really even know what a shader is and at this point I'm too afraid to ask" - I've written something just for you. https://t.co/0ez5xz5vCP

156

684

713K

vinjn retweeted

とよふく🎍Toyofuku

@Yeq6X

8 months ago

人間用に作った抽象素体化LoRAとこの稜線LoRAを同時に使ったら猫の抽象化ができた

712

163

296

126K

vinjn retweeted

Cernovich

@Cernovich

6 months ago

Scott Adams, facing death, shows us how to live. Someone recommended “How to Fail at Almost Everything and Still Win Big” by Scott Adams. I had burned out on mainstream books, but picked it up, and was hooked. He had put into words a way of living, similar to one I had found, except his approach was systemic and analytical. Better than my own slapdash notes. Outside of religious texts, Adams was and is as close to a “guide to life,” as you’ll ever find. And even if you’re religious, you still live in this world, and would be wise to learn how to navigate it. Scott is closing in on the end of his life, and even now he is creating new beginnings. I’d better write this now, I won’t be able to when it’s too late. After losing Charlie Kirk, a lot of us are wondering how we can possibly write another obituary. While there’s much to complain about the internet and social media, those mediums expanded the sizes of our communities, our influences, and indeed our families. Too often we find new ways to hate people, instead of finding new people to love. Scott Adams comes up in conversation at every social event I host. “How is Scott Adams doing? Will he make it?” We all talk about streams we watched and lessons learned. It’s a memorial except he’s still alive. Scott would love to hear that, which is why I have said so repeatedly. I’ve lost too many people, via death or fallings-out, to leave feeling unexpressed. He’s been a surrogate father figure and mentor to millions of people. Scott Adams is not liked, he is loved. People don’t “like” Scott Adams, they aren’t “a fan of his.” They love this man. And I do as well. I’m still living in denial of his fate. We all are. We’d been making a film about the meaning of life, and while Scott Adams had been in both of our other films, we hadn’t booked him for Meaning yet. Then we found out he was going to take the ride of assisted suicide. Foolishly, we had assumed he’d always be around. Nobody ever dies, right? Your dad will be there to take your call the next time you phone home. Your friends aren’t going anywhere. That’s how we too often live. We could book Scott later. We reached out and he graciously agreed to be interviewed. We all knew it was going to be our last interview together. Scott and I are both efficient with our time. When a moment is over, it’s time to go do something else. Obligations call. The crew pushed this one as long as we could. After the interview wrapped up and the gear was packed and it was time to go, there was an awkward pause. I broke it. “Scott, we love you.” He said thank you. “No, Scott, we love you, I mean it, we all do. We love you.” None of us broke down crying, not that there would have been any shame in that, but we no doubt all soon will. Well then, what is the lesson of Scott Adams? On a practical level, the lesson of Scott Adams is the power of showing up. Nobody works harder and on a more regular schedule. You can set your clock to Scott’s show. Too many of us wait for the muse of inspiration or the jolt of information to force us into action. Work, everyday, maybe in obscuring and without tangible benefits for years. Eventually you’ll hit your mark and go beyond. Scott plugged away with his streams from a small account (after a huge career via Dilbert) and soon became must-watch, and then transcended his role to becoming something much more. On a spiritual level, we might ask, why do we love Scott? It’s not because he’s so smart (he is). There are not shortage of intelligent, clever, Machiavellian, and rich people with podcasts. When one of them dies, what is lost? All of that Ego and desire for adoration, and does anybody even care? When those people fall while living, who will be there? Scott is loved because he’s devoted his life to service to humanity. “What is the meaning of life,” is the question we ask every interviewee, and Scott’s answer, “Be useful to humanity.” Despite pain, sickness, and inevitable death, Scott is doing his daily streams, serving his country and all of humankind until his end. He’s a light to the world and a mirror for all of us. What exactly are we doing with the gift of life given to us by God. (Scott believes in the Simulation, but I believe God evens this all out in the Judgment.) Are we doing enough for others? Are we doing anything for others? Like everyone else, I’m capable of throwing myself a pity party. Sometimes when life is going too well, and I don’t have real problems, I invent some. That’s where the Ego brings you, recursively worshipping itself, and when that fails, tormenting itself, as each path leads to its own attention. May all of us live more like Scott Adams, and may God bless his immortal soul when he passes. P.S. I ran this article through Grok for typos. The original version had “immoral” soul where I meant it to read “immortal.” I think Scott would have had a great laugh had that typo been left in.

Cernovich's tweet photo. Scott Adams, facing death, shows us how to live.

Someone recommended “How to Fail at Almost Everything and Still Win Big” by Scott Adams. I had burned out on mainstream books, but picked it up, and was hooked. He had put into words a way of living, similar to one I had found, except his approach was systemic and analytical. Better than my own slapdash notes. Outside of religious texts, Adams was and is as close to a “guide to life,” as you’ll ever find. And even if you’re religious, you still live in this world, and would be wise to learn how to navigate it.

Scott is closing in on the end of his life, and even now he is creating new beginnings.

I’d better write this now, I won’t be able to when it’s too late.

After losing Charlie Kirk, a lot of us are wondering how we can possibly write another obituary. While there’s much to complain about the internet and social media, those mediums expanded the sizes of our communities, our influences, and indeed our families. Too often we find new ways to hate people, instead of finding new people to love.

Scott Adams comes up in conversation at every social event I host. “How is Scott Adams doing? Will he make it?” We all talk about streams we watched and lessons learned. It’s a memorial except he’s still alive. Scott would love to hear that, which is why I have said so repeatedly. I’ve lost too many people, via death or fallings-out, to leave feeling unexpressed.

He’s been a surrogate father figure and mentor to millions of people.

Scott Adams is not liked, he is loved.

People don’t “like” Scott Adams, they aren’t “a fan of his.” They love this man. And I do as well. I’m still living in denial of his fate. We all are.

We’d been making a film about the meaning of life, and while Scott Adams had been in both of our other films, we hadn’t booked him for Meaning yet. Then we found out he was going to take the ride of assisted suicide. Foolishly, we had assumed he’d always be around. Nobody ever dies, right? Your dad will be there to take your call the next time you phone home. Your friends aren’t going anywhere. That’s how we too often live. We could book Scott later.

We reached out and he graciously agreed to be interviewed. We all knew it was going to be our last interview together. Scott and I are both efficient with our time. When a moment is over, it’s time to go do something else. Obligations call. The crew pushed this one as long as we could.

After the interview wrapped up and the gear was packed and it was time to go, there was an awkward pause. I broke it.

“Scott, we love you.” He said thank you. “No, Scott, we love you, I mean it, we all do. We love you.”

None of us broke down crying, not that there would have been any shame in that, but we no doubt all soon will.

Well then, what is the lesson of Scott Adams?

On a practical level, the lesson of Scott Adams is the power of showing up. Nobody works harder and on a more regular schedule. You can set your clock to Scott’s show. Too many of us wait for the muse of inspiration or the jolt of information to force us into action. Work, everyday, maybe in obscuring and without tangible benefits for years. Eventually you’ll hit your mark and go beyond.

Scott plugged away with his streams from a small account (after a huge career via Dilbert) and soon became must-watch, and then transcended his role to becoming something much more.

On a spiritual level, we might ask, why do we love Scott? It’s not because he’s so smart (he is). There are not shortage of intelligent, clever, Machiavellian, and rich people with podcasts. When one of them dies, what is lost? All of that Ego and desire for adoration, and does anybody even care? When those people fall while living, who will be there?

Scott is loved because he’s devoted his life to service to humanity. “What is the meaning of life,” is the question we ask every interviewee, and Scott’s answer, “Be useful to humanity.”

Despite pain, sickness, and inevitable death, Scott is doing his daily streams, serving his country and all of humankind until his end.

He’s a light to the world and a mirror for all of us.

What exactly are we doing with the gift of life given to us by God. (Scott believes in the Simulation, but I believe God evens this all out in the Judgment.) Are we doing enough for others? Are we doing anything for others?

Like everyone else, I’m capable of throwing myself a pity party. Sometimes when life is going too well, and I don’t have real problems, I invent some. That’s where the Ego brings you, recursively worshipping itself, and when that fails, tormenting itself, as each path leads to its own attention.

May all of us live more like Scott Adams, and may God bless his immortal soul when he passes.

P.S. I ran this article through Grok for typos. The original version had “immoral” soul where I meant it to read “immortal.” I think Scott would have had a great laugh had that typo been left in.

31K

vinjn retweeted

Tal @eiopa

7 months ago

I created this app for NeurIPS, mostly driven by my own desires to not feel lost 🙂 https://t.co/029Pbmq3lh It’s small but mighty! 🧵

vinjn retweeted

宝玉

@dotey

7 months ago

Gemini knows your location and current date, so you can ask gemini to get the location and date by itself, e.g. ---- City name: {get my location from my profile} Date: {get current date} ---- ------ full prompt ------- Present a clear, 45° top-down view of a vertical (9:16) isometric miniature 3D cartoon scene, highlighting iconic landmarks centered in the composition to showcase precise and delicate modeling. The scene features soft, refined textures with realistic PBR materials and gentle, lifelike lighting and shadow effects. Weather elements are creatively integrated into the urban architecture, establishing a dynamic interaction between the city's landscape and atmospheric conditions, creating an immersive weather ambiance. Use a clean, unified composition with minimalistic aesthetics and a soft, solid-colored background that highlights the main content. The overall visual style is fresh and soothing. Display a prominent weather icon at the top-center, with the date (x-small text) and temperature range (medium text) beneath it. The city name (large text) is positioned directly above the weather icon. The weather information has no background and can subtly overlap with the buildings. The text should match the input city's native language. Please retrieve current weather conditions for the specified city before rendering. City name: {get my location from my profile} Date: {get current date}

33K

vinjn retweeted

NVIDIA AI Developer

@NVIDIAAIDev

7 months ago

Have you heard what we’ve been cooking? 🧑‍🍳 We’re serving up step-by-step recipes for post-training, inference, data curation, and more in our Cosmos Cookbook. 📖 Guided video augmentations for realistic transformations 📖 Domain adaptation and synthetic data augmentation for autonomous vehicle research 📖 Sim2Real data augmentation for robotics navigation Read our blog to learn more ➡️ https://t.co/4GTSH4E5d9 Start cooking ➡️ https://t.co/CVJYppJdgy

Jing Zhang @vinjn

7 months ago

@LufzzLiz @bigerchicken 求

vinjn retweeted

Bingyi Kang

@bingyikang

7 months ago

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: @HaotongLin, Sili Chen, Jun Hao Liew, @donydchen. 👇(1/n) #DepthAnything3

492

515K

vinjn retweeted

Ashok Elluswamy

@aelluswamy

8 months ago

Full video of the ICCV '25 presentation

293

204K

Jing Zhang

@vinjn

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users