New here. Building with AI, sharing what actually works.
What should I post next?
Want me to make a second version with a different vibe - edgy, professional, or aesthetic?
This is THE moment of Physical AI!
We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀
- Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions.
- It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.”
- Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks.
Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate.
The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community.
Welcome to the era of Physical AI.
HuggingFace: https://t.co/QW5h5pIWWM
Project Website: https://t.co/Jppa0gkn16
Code: https://t.co/aJgaLm5BaG
@lilidiai@imagine@ElevenLabs That’s seriously cool — turning an idea into a full film without a crew is next-level creative. Would love to watch it, drop the link?