We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video.
It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵
Google Omni can recognize objects from an image, not sure how exactly does it but this is not simply a video model.
Input is an image (for each group of objects), asking Google Omni to analyze what are the objects and create a short video about them (so one image of the cars without information about their make/model).
Again: this model has to be treated more like Nano Banana and not like Veo.
> Amateur vertical phone video, 9:16 aspect ratio. Continuous, unbroken handheld shot of a fluffy tabby cat sitting on a sunny windowsill, looking out into a leafy garden. The cat's tail twitches slowly, and its ears rotate slightly toward ambient noises. Sunbeams illuminate dust motes in the air. Sound design: Gentle breeze, distant bird chirps, quiet mechanical purring. No dialogue.
Nano Banana for video is here 🍌🎥
Gemini Omni is our new AI model that makes creating and editing videos as easy as having a conversation.
Here’s how it works ↓ https://t.co/j7NvPK3Uj8
Gemini Omni Flash:
> a recording from a capsule on the london eye, a jerky zoom into something in the distance and then refocusing (with a bit of back and forth) (no timestamp or dialog)
Note the world knowledge of London’s landscape, and the way the video is gently moving like the capsules do.
Gemini Omni is a major leap in world understanding & multimodal editing! It can take photos, video & audio and build entirely new scenes. Over time it’ll be able to handle any input & any output - starting w/ video
You can even give it your own videos & iterate on your ideas:
By now, you've probably heard about Gemini Omni, our new model designed to create anything from any input, starting with video.
But... what's the big deal?
Let’s break it down 🧵👇
We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video.
It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵
You went 🍌🍌 for Nano Banana. Now, meet Nano Banana Pro.
It’s SOTA for image generation + editing with more advanced world knowledge, text rendering, precision + controls. Built on Gemini 3, it’s really good at complex infographics - much like how engineers see the world:)
Our Science team @GoogleDeepMind is relentless. Hot on the heels of AlphaEvolve (algorithms) in May, AlphaGenome (biology) in June, and our IMO gold medal model (maths) last week, we're ready for a new launch. Watch this space. The acceleration is real.