Rohan Doshi

Verified account

@RohanLikesAI

gemini multimodal understanding, product @ deepmind. views are my own.

Joined April 2020

208 Following

1.4K Followers

91 Posts

Pinned Tweet

7 months ago

🚀 We just launched Gemini 3 Pro — the strongest multimodal understanding model ever built. I lead product for Gemini’s multimodal vision capabilities, and I want to share more about the massive wins we are seeing across document, screen, spatial, and video understanding. 🧵

8

45

5

9

9K

RohanLikesAI retweeted

3 days ago

PSA: Gemini 3.5 Flash's multimodal understanding is seriously underrated. Beats Gemini 3.1 Pro. 3x faster, half the cost. Great work by @roboflow.

_philschmid's tweet photo. PSA: Gemini 3.5 Flash's multimodal understanding is seriously underrated. Beats Gemini 3.1 Pro. 3x faster, half the cost. Great work by @roboflow. https://t.co/U6GOx54CEg

15

261

10

29

14K

RohanLikesAI retweeted

Logan Kilpatrick

@OfficialLoganK

28 days ago

Gemini 3.5 Flash outperforms 3.1 Pro on many vision use cases (like the below Roboflow eval) while being ~6x faster on average 🤯 Gemini multimodal understanding for the win.

OfficialLoganK's tweet photo. Gemini 3.5 Flash outperforms 3.1 Pro on many vision use cases (like the below Roboflow eval) while being ~6x faster on average 🤯 Gemini multimodal understanding for the win. https://t.co/SwA00YQa8R

126

1K

50

73

64K

29 days ago

gemini 3.5 flash is the engine that powers agents across all google products. would love to hear what y'all think

Logan Kilpatrick

@OfficialLoganK

29 days ago

Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it.

OfficialLoganK's tweet photo. Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it. https://t.co/zrirfMHCwI

286

2K

99

133

477K

1

6

0

0

688

Who to follow

Verified account

product @aptoslabs | building @decibeltrade | mba @harvardhbs '23 | ex-blackrock

Christina Bognet

Verified account

previously: co-founder/CEO at PlateJoy Health (acquired) + neuroscience @MIT favorite poem: https://t.co/DxioLkr8Dh

Turn long videos into social-ready clips in a click 🪄 Empowering creators, marketers & agencies✨

about 2 months ago

Had fun speaking at Cloud Next 2026 as a panelist for "How DeepMind Makes Modeling Decisions" ✨ DM me if you want to keep chatting about the future of frontier models and multimodal agents 🤖

RohanLikesAI's tweet photo. Had fun speaking at Cloud Next 2026 as a panelist for "How DeepMind Makes Modeling Decisions" ✨ DM me if you want to keep chatting about the future of frontier models and multimodal agents 🤖 https://t.co/Bw1AXEbUK3

0

1

0

0

200

2 months ago

Gemini's agentic vision really shines in our latest SOTA robotics model for things like reading instruments, estimating proportions, and counting 🤖

Logan Kilpatrick

@OfficialLoganK

2 months ago

Introducing Gemini Robotics ER 1.6, our new SOTA robotics model 🤖 which excels at visual and spacial reasoning, now available via the Gemini API!

OfficialLoganK's tweet photo. Introducing Gemini Robotics ER 1.6, our new SOTA robotics model 🤖 which excels at visual and spacial reasoning, now available via the Gemini API! https://t.co/orAoslp4Zu

73

2K

180

289

118K

0

9

2

0

1K

RohanLikesAI retweeted

5 months ago

filesystem + code sandbox combo eats another modality. remember when o3 destroyed at geoguessr? gemini agentic vision will find location on any street photo you take faster than Liam Neeson can get back his daughter

swyx's tweet photo. filesystem + code sandbox combo eats another modality.

remember when o3 destroyed at geoguessr?

gemini agentic vision will find location on any street photo you take faster than Liam Neeson can get back his daughter https://t.co/U3XgMrXon8

21

155

12

99

24K

4 months ago

An awesome deep dive on how to leverage Gemini 3 Agentic Vision today

Google AI Developers

4 months ago

Gemini 3 Flash now uses an agentic "think-act-observe" loop to solve complex visual tasks 🤖 @GoogleDeepMind engineer @ptruiz_dev demonstrates how the model runs Python code automatically to zoom and inspect items, annotate images, and re-visualize data into charts.

54

2K

194

685

107K

2

49

1

15

3K

4 months ago

Back at Harvard Business School last week speaking on frontier AI + agents 🤖 As a ’23 alum, it was energizing to be back - this time teaching from the other side of the classroom My AI agent workshop was completely packed, with 100+ students - signal on how much Harvard is leaning into AI The students’ agency, raw IQ, and curiosity left me wildly optimistic about the next wave of AI builders 🚀 Grateful to Profs. Jeffrey Bussgang & Allison Mnookin and the Launching Tech Ventures team for the invite 🙏🏼

RohanLikesAI's tweet photo. Back at Harvard Business School last week speaking on frontier AI + agents 🤖

As a ’23 alum, it was energizing to be back - this time teaching from the other side of the classroom

My AI agent workshop was completely packed, with 100+ students - signal on how much Harvard is leaning into AI

The students’ agency, raw IQ, and curiosity left me wildly optimistic about the next wave of AI builders 🚀

Grateful to Profs. Jeffrey Bussgang & Allison Mnookin and the Launching Tech Ventures team for the invite 🙏🏼

RohanLikesAI's tweet photo. Back at Harvard Business School last week speaking on frontier AI + agents 🤖

As a ’23 alum, it was energizing to be back - this time teaching from the other side of the classroom

My AI agent workshop was completely packed, with 100+ students - signal on how much Harvard is leaning into AI

The students’ agency, raw IQ, and curiosity left me wildly optimistic about the next wave of AI builders 🚀

Grateful to Profs. Jeffrey Bussgang & Allison Mnookin and the Launching Tech Ventures team for the invite 🙏🏼

RohanLikesAI's tweet photo. Back at Harvard Business School last week speaking on frontier AI + agents 🤖

As a ’23 alum, it was energizing to be back - this time teaching from the other side of the classroom

My AI agent workshop was completely packed, with 100+ students - signal on how much Harvard is leaning into AI

The students’ agency, raw IQ, and curiosity left me wildly optimistic about the next wave of AI builders 🚀

Grateful to Profs. Jeffrey Bussgang & Allison Mnookin and the Launching Tech Ventures team for the invite 🙏🏼

0

15

2

2

1K

5 months ago

@hololux Very cool!

1

1

0

0

51

5 months ago

what are y'all building with Gemini Agentic Vision??

Omar Sanseviero

5 months ago

Introducing Agentic Vision with Gemini 3! 👀🔥 Gemini can now write and execute code to zoom, annotate, inspect, and plot directly with vision input, all while leveraging it's advanced reasoning capabilities

osanseviero's tweet photo. Introducing Agentic Vision with Gemini 3! 👀🔥

Gemini can now write and execute code to zoom, annotate, inspect, and plot directly with vision input, all while leveraging it's advanced reasoning capabilities https://t.co/jCt5BoqlXW

74

2K

143

667

167K

5

33

3

9

5K

5 months ago

@danielpearson Let’s chat! DM me!

0

0

0

0

19

5 months ago

@RejaullahmdMd Select Gemini 3 Flash as the model, turn on the code execution model, and upload an image!

1

2

0

0

67

5 months ago

🚀 Excited to officially launch 👁Agentic Vision via Gemini 3 Flash. Gemini can run code execution on image uploads to zoom, analyze, and annotate: 🔍 Zoom: 5-10% quality win across vision benchmarks 🧮 Analyze: do image math with code (e.g. calculate the tip for a receipt) ✏️ Annotate: Draw arrows or bounding boxes to answer questions Try via the Gemini API (AI Studio / Vertex) or via the Gemini App (rolling out to Thinking mode today). Learn more→ https://t.co/JZtFU7fR05 Demo: https://t.co/cXR9v1vwJo cc: @IoanaBica95 @anastasija56572 @jalayrac @bcaine @eisenjulian @weichengkuo @phillip_lippe @xf1280 @tulseedoshi @BiboXu @OfficialLoganK

Google AI Developers

5 months ago

Try 👁 Agentic Vision with Gemini 3 Flash in @GoogleAIStudio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks. See Agentic Vision in action: https://t.co/z0k9VG1YmQ

23

856

113

333

175K

8

232

27

92

29K

5 months ago

@Jake_Joseph @AdMachineAI Very cool - would love to chat and see if we can help

1

1

0

0

90

5 months ago

@1littlecoder @shresbm both! AIS docs: https://t.co/cgjYtbEnST

0

0

0

0

26

6 months ago

@marcosmarf27 @OfficialLoganK our team should be able to help debug things: feel free to respond to my DM or email me at [email protected].

1

3

0

0

33

6 months ago

@marcosmarf27 @OfficialLoganK can you help me better understand your entire pipeline better? what are "vision resources". Are you using another upstream system to do OCR? And are you feeding the OCR text and the PDF images into Gemini from there?

0

1

0

0

22

6 months ago

@deedydas @deedydas 👋 glad you’re a fan of the launch (I’m the Gemini multimodal vision PM) - feel free to DM if you have any feedback for the team on doc understanding

0

10

0

2

1K

RohanLikesAI retweeted

6 months ago

Gemini 3 Flash is insane at OCR. It parses this extremely hard to read handwritten letter by Richard Feynman perfectly. It can do ~300 of these for $1. What's crazy is Feynman addresses General Donald J. Kutyna as "Katyna" which Gemini gets. There is no "Meeting Katyna", the first part of the letter, in all of Google search!

deedydas's tweet photo. Gemini 3 Flash is insane at OCR.

It parses this extremely hard to read handwritten letter by Richard Feynman perfectly. It can do ~300 of these for $1.

What's crazy is Feynman addresses General Donald J. Kutyna as "Katyna" which Gemini gets. There is no "Meeting Katyna", the first part of the letter, in all of Google search!

64

2K

160

603

178K

6 months ago

@matidotlol @googleaidevs @OfficialLoganK cc: @jalayrac @bcaine

0

1

0

0

90

6 months ago

@matidotlol @googleaidevs @OfficialLoganK Gemini Vision PM here! Let's chat (I'll DM you). can you share some queries+PDF examples. I'll have our team debug what's going on

1

2

0

1

99

Last Seen Users on Sotwe

Trends for you

Most Popular Users