Gemini 3.5 Flash is amazing!
- Performs better than 3.1 Pro on coding & agentic tasks
- 4x faster than other frontier models
- 12x faster in @antigravity - 800 tokens/sec!
- Often at less than half the cost
And Pro to come…
Try it in @antigravity, @GeminiApp & more - enjoy!
Why do we need Ego2Web?
Today’s benchmarks split into two worlds:
• Egocentric video → real-world perception & reasoning
• Web agents → perception + action, but only in digital environments
But real tasks require both seeing something and acting on it. Existing benchmarks miss this connection.
Introducing Ego2Web from Google DeepMind and UNC Chapel Hill, accepted to #CVPR2026.
AI agents can browse the web. But can they act based on what you see? Existing benchmarks focus only on web interaction while ignoring the real world.
Ego2Web bridges egocentric video perception and web execution, enabling agents that can see through first-person video, understand real-world context, and take actions on the web grounded in the egocentric video.
This opens a path toward AI assistants that operate seamlessly across physical and digital environments. We hope Ego2Web serves as an important step for building more capable, perception-driven agents.
🧵👇
These major improvements in accuracy of rendered text are part of why the Nano Banana Pro model is such an upgrade over our earlier Nano Banana model (e.g. error rate goes from 56% for Nano Banana, aka Gemini 2.5 Flash Image, to 8% for Nano Banana Pro, aka Gemini 3 Pro Image). That's bananas! 🍌🍌🍌🍌
Sad to hear about Bill Atkinson's passing.
When I was a kid and we got our first original 128k Mac, I absorbed the programming documentation and marveled at the amazing breadth of functionality crammed into the QuickDraw library that he wrote.
https://t.co/kF8DLfFZGc
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts.
Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning.
And we see promising results when we increase inference time computation!
I don’t feel offended because this is not the truth. I feel funny because @NeurIPSConf allowed such an absurd keynote that was presented to all the brilliant Chinese scholars
Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥
We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through @LabsDotGoogle. → https://t.co/zMJQwON4Gx
@NeurIPSConf Someone confronted on the spot, and she even dare to reply "Maybe there is one, maybe they are common, who knows what. I hope it was an outlier.” wtf @RosalindPicard resign from @MIT or prepare to get fired!
Screen user interfaces (UIs) and infographics facilitate rich interactive user experiences. ScreenAI is a vision-language model that achieves state-of-the-art results on UI and infographics-based tasks. Read more and check out the open sourced datasets → https://t.co/Zijuq268qC
Important paper. They found that “Transformers' performance will rapidly decay with increased task complexity”
That’s why LLMs are very powerful, but also very limited.
They investigated the limits of Transformers/LLMs models across three representative compositional tasks: multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem.
https://t.co/SRgXlQAUUY
Did you remember the "help me write an email" example Sundar presented in #GoogleIO weeks ago? We just released Rewrite PaLM 2, an instruction-tuned LLM that empowers the amazing "help me (re)write" features for Gmail and Workspace. Check our paper: https://t.co/QF6xtfMSWL
We love to refine our writing with LLMs. Thanks to "RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting" they are going to get so much better at it. Congratulations to the team on this awesome release!!!
https://t.co/U8PyluaZOJ