crazy to see a wild research idea I dreamt about for a long time land into this Gemini 2.5 Flash! it's been amazing making this happen with @YiTayML and @quocleix, and the rest of the team!!!
๐จBreaking from Arena: @GoogleDeepMind's new Gemini-2.5-Flash climbs to #2 overall in chat, a major jump from its April release (#5 โ #2)!
Highlights:
- Top-2 across major categories (Hard, Coding, Math)
- #3 in WebDev Arena, #2 in Vision Arena
- New model at the cost-performance Pareto frontier
AI progress continues to accelerate๐ฅCongrats again @GoogleDeepMind!
Happy to share that the @GoogleDeepMind Gemini team is starting a new research team in Singapore!
This new team will be focused on advanced reasoning, LLM/RL and improving bleeding edge SOTA models such as Gemini, Gemini Deep Think and beyond. ๐ฅ
This team will be led by yours truly and reports up to Quoc Le (@quocleix)'s broader team in Mountain View which was recently in the center of both IMO gold medal and ICPC gold medal breakthroughs with Gemini Deep Think, amongst many other significant Gemini advancements. ๐
Weโre starting out with a very small but intensely capable force because talent density is key over anything else in the LLM era. Over the past few months, we have gone around and gathered the best of the best talent (in the region and beyond) and Iโm confident weโll have a super cracked team very soon.
If you are interested in joining and have made truly exceptional contributions in any domain or area, (engineering and/or research etc) please contact me.
This is quite an exciting time, with the Gemini / GenAI team at Google Deepmind leading the charge at the frontier. This is also the best opportunity to be on the critical path to AGI from the sunny island of Singapore. ๐๏ธ
Many thanks to leadership support from @quocleix@JeffDean@benoitschilling, @EugenieRives and @demishassabis for the support of this team.
Wonderful and fun image generated by Nano Banana ๐
The momentum this week has been great, but the research wheel keeps turning.
Staring at experimental plots this Friday night that honestly look too good to be true. Love this stage of drastic research. The energy right now ๐๐๐ So much fun building w/ @vqctran@AtharvaParulek7
Excited to see this go out and see it used beyond IMO -- congrats to the team!! Happy to have contributed some research to this model with @YiTayML and @HuaixiuZheng :D
Weโre bringing a version of Deep Think that achieved gold-medal status at IMO to Ultra subscribers in the @Geminiapp (+ the official version is now in the hands of mathematicians).ย
Toggle it on when reasoning through complex scientific literature, tackling a coding problem that requires careful consideration of time complexities - or anything else @DemisHassabis considers a fun Friday night:)
Great job Yi and team!!! So amazing to see that a general purpose text-only Gemini model nails an IMO Gold metal! Super proud to contribute a small part to this via some cool research we did with @YiTayML and @vinh!
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. ๐ฅ
It solved 5๏ธโฃ out of 6๏ธโฃ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Hereโs how ๐งต
Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fastest + most cost-efficient 2.5 model yet. ๐ฆ
Exciting steps as we expand our 2.5 series of hybrid reasoning models that deliver amazing performance at the Pareto frontier of cost and speed. ๐
even deeper BTS: semantic ids in DSI were thought of and implemented almost completely from the hospital since I was in poor health back then (kidney failure, all better now!) -- sometimes doing interesting research is a great distraction from other more difficult things going on
Oh we're doing more fun BTS here? ๐ I accidentally invented generative retrieval as a side project because my manager at Google at that time was very into IR/retrieval but IR was kind of a sunset/boring field so I kind of wanted to make it cool again ๐
After DSI was born, there were a couple of follow ups and we worked with @madiator and others to apply it on a bunch of things like production recommender systems and what not. I don't follow much anymore but I heard this has been a pretty successful direction in the industry overall today.
DSI remains one of the most absurd work that I've done in my career and one of my favourites. While I get so many requests till today to advice in this direction because of this pioneering contribution, it's kind of funny when I get asked by random people if I have heard about "generative retrieval". ๐ Sometimes I say no just for the Lols.
Ive always joke that I don't think I'll get a test of time anymore in my life given we have stopped publishing altogether but to me DSI might be the closest work I have to stand a chance ๐
PS: @vqctran who worked with me for a long time and even now today at GDM invented the semantic id approach in the DSI paper. He's great so you should follow him. ๐
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! ๐
Tested under codename "nebula"๐, Gemini 2.5 Pro ranked #1๐ฅ across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn!
Massive congrats to @GoogleDeepMind for this incredible Arena milestone! ๐
More highlights in thread๐
Take BIG-Bench Hard but make it EVEN HARDER!!
Check out this cool new benchmark that really shows how much further our models still have to go on general reasoning!
Is BIG-Bench Hard too easy for your LLM?
We just unleashed BIG-Bench EXTRA Hard (BBEH)! ๐
Every task, harder! Every model, humbled! (Poem Credit: Gemini 2.0 Flash)
Massive headroom for progress across various areas in general reasoning ๐คฏ