Made by Grok Imagine, this is movie-trailer quality.
Multimodal is for entire humanity, this is how humans perceive the world, connect with one another, and communicate ideas. Grok Imagine is built for that reality: not just to create stunning visuals, but to make imagination truly useful for everyone. It won’t be long before multimodal becomes truly indispensable.
Nano Banana has truly redefined what's possible with image generation models, pushing the boundaries of people's imagination when it debuted
Today, we're excited to introduce Grok-Imagine-Image: a new model that's both faster and better than Nano Banana.
Through this journey, we've built many of the essential building blocks needed to unlock the next generation of models and to keep fueling the growth and prosperity of the visual AI community.
Stay tuned... something incredible is coming very soon! But today, hello world, grok-imagine-image!
At 4:00 today, stop by the #CVPR2025 Google booth where Ting Liu will demo a model for video creation by demonstration that can generate physically plausible video that continues naturally given a context scene. Find sample videos at https://t.co/VmfjfuxDgR
We are delighted to share our latest work: Video Creation by Demonstration (https://t.co/7TQOWA9tND)! See our intereseting results here: https://t.co/vjLBnwmYbF
Introducing our latest work Video Creation by Demonstration, a novel video creation experience.
Paper: https://t.co/YZFCLKj5aM
Project: https://t.co/o9inp7qScE
Huggingface: https://t.co/Lg5h7kvr70
Happy to share our recent work "Epsilon-VAE", an effective autoencoder that turns single-step decoding into a multi-step probabilistic process. Please check our paper for more detailed results!
arXiv page: https://t.co/TcZf6FzyX6
Super excited to be featured by Google AI! We are also happy to share that our VideoPrism paper has been accpted by ICML 2024. Looking forward to meeting you guys in Vienna!
Paper: https://t.co/to6je51VsR
Blog: https://t.co/fOgROlqN4b
Introducing Long Zhao, a Senior Research Scientist at Google, who worked to build VideoPrism: A Foundational Visual Encoder for Video Understanding.
Read the blog to explore innovations in video understanding tasks and more →https://t.co/MnfeIMAohS
Our team will present our paper "Unified Visual Relationship Detection with Vision and Language Models" (https://t.co/Vtqy2I3PLi) at #ICCV2023 in Paris next week.
Please join our poster session on Wednesday (Oct. 4th, 2023) 02:30 PM-04:30 PM to learn more!
📢 Our #SMART101 challenge is now open! 🎉 Join the brightest minds in multimodal reasoning and cognitive models of intelligence to drive AI progress. 🚀 Don't miss out! Challenge closes on Sept. 1. Winning teams will receive prizes! 🏆 https://t.co/asTC5oscJh
#VLAR#ICCV2023#AI
The Visual Transformer has helped advance many core computer vision applications, e.g., image classification, but training can be inefficient and models lack interpretable designs. Learn how the Nested Hierarchical Transformer addresses these challenges → https://t.co/JGYUJzW7BL