⚡ Excited to announce Gemini 3.1 Flash-Lite! We’ve set a new standard for efficiency and capability to give developers our fastest, most cost-effective Gemini 3 model yet.
We engineered this model with thinking levels, allowing it to handle high-volume queries instantly, while scaling up its reasoning for complex edge cases.
By the numbers:
⏱️ 2.5X faster time-to-first-token than 2.5 Flash while being significantly higher quality
📉 $0.25 per 1M input tokens
📊 1432 Elo on LMArena & 86.9% on GPQA Diamond
Thrilled to see what developers build with this kind of speed and quality at scale. Available now in Google AI Studio and Vertex AI.
https://t.co/Weal73Juh8
Introducing SPECS (SPECulative test time Scaling), a test-time scaling (TTS) algorithm with pareto-frontier latency/accuracy trade-off.
Scaling test-time compute improves LLM reasoning but imposes a latency overhead.
Prior work optimizes TTS accuracy as a function of FLOPS, we propose to further reduce latency by addressing the memory bottleneck of LLM inference through speculative drafts. See a breakdown of the method below.
(1/n) 🧵 👇
I will be at NeurIPS from today to Dec. 7th. Excited to meet old and new friends at the conference. Happy to chat about anything related to LLM efficiency, RL, and differential privacy. #NeurIPS2025
At the Wednesday noon session (11 AM – 2PM), I will be presenting our spotlight work: Private Set Union with Multiple Contributions (#1314), where we establish fundamental limits on the utility of discovering set unions privately, and how we can bypass the limit by leveraging a prediction.
Joint work with awesome collaborators at Google Research: Travis Dick, Haim Kaplan, Alex Kulesza, Uri Stemmer, and @th33rtha.
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything.
It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt.
Let me elaborate on what I mean by that and a cheaper way of doing it offline.
[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with @ananthbshankar, @abeirami, @jacobeisenstein, and myself.
Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.
Can we align our model to better suit a given inference-time procedure?
We answer this affirmatively, check out the thread below.
@GaoZhaolin Nice work! We used multiple offline roll-outs for reward calibration when studying inference-aware RLHF. We had the observation that it helped for vanilla RLHF as well. Might be of interest.
https://t.co/hKUszWsx5z
Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.
Can we align our model to better suit a given inference-time procedure?
We answer this affirmatively, check out the thread below.
@abeirami Congratulations on all the amazing achievements. I am super grateful for the opportunity to be part of the journey and learn from you. Looking forward to your amazing achievements to come.
Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.
Can we align our model to better suit a given inference-time procedure?
We answer this affirmatively, check out the thread below.
Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!
📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
│
🗓️ Deadline: May 19, 2025