Samira Daruki

@SamiraDaruki

Learning and Training Gemini ♊, PreTraining 🤝 RL PostTraining, Science of Scaling, Model Design, Compute 🤝 Intelligence 🤝 Efficiency, Science 🤝 Infra.

San Francisco, CA

Joined May 2011

978 Following

260 Followers

276 Posts

Pinned Tweet

Samira Daruki @SamiraDaruki

over 2 years ago

What a year @GoogleAI (Dec 2022-Dec 2023)🚀Working with an amazing team all over the globe has been a highlight, impressed with how Gemini was built as a startup within Google. Been a unique rewarding experience with tons of learning along the journey. Another step forward in AI.

Jeff Dean

@JeffDean

over 2 years ago

I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks, including 10 of 12 popular text and reasoning benchmarks, 9 of 9 image understanding benchmarks, 6 of 6 video understanding benchmarks, and 5 of 5 speech recognition and speech translation benchmarks. Gemini Ultra is the first model to achieve human-expert performance on MMLU across 57 subjects with a score above 90%. It also achieves a new state-of-the-art score of 62.4% on the new MMMU multimodal reasoning benchmark, outperforming the previous best model by more than 5 percentage points. Gemini was built by an awesome team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google, and is one of the largest science and engineering efforts we’ve ever undertaken. As one of the two overall technical leads of the Gemini effort, along with my colleague @OriolVinyalsML, I am incredibly proud of the whole team, and we’re so excited to be sharing our work with you today! There’s quite a lot of different material about Gemini available, starting with: Main blog post: https://t.co/NzSycJl7aE 60-page technical report authored by th Gemini Team: https://t.co/CEdMRyYSLo In this thread, I’ll walk you through some of the highlights.

JeffDean's tweet photo. I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks, including 10 of 12 popular text and reasoning benchmarks, 9 of 9 image understanding benchmarks, 6 of 6 video understanding benchmarks, and 5 of 5 speech recognition and speech translation benchmarks. Gemini Ultra is the first model to achieve human-expert performance on MMLU across 57 subjects with a score above 90%. It also achieves a new state-of-the-art score of 62.4% on the new MMMU multimodal reasoning benchmark, outperforming the previous best model by more than 5 percentage points.

Gemini was built by an awesome team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google, and is one of the largest science and engineering efforts we’ve ever undertaken. As one of the two overall technical leads of the Gemini effort, along with my colleague @OriolVinyalsML, I am incredibly proud of the whole team, and we’re so excited to be sharing our work with you today!

There’s quite a lot of different material about Gemini available, starting with:

Main blog post: https://t.co/NzSycJl7aE

60-page technical report authored by th Gemini Team: https://t.co/CEdMRyYSLo

In this thread, I’ll walk you through some of the highlights.

239

13K

SamiraDaruki retweeted

Lilian Weng

@lilianweng

9 days ago

A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason about the optimal compute allocation between data and model size before committing to a large run. The post covers what scaling laws predict, how compute-optimal allocation works, why Kaplan et al. and Chinchilla disagree, and how data limits + fitting details make extrapolation tricky. https://t.co/HP26eJvjHB

573

420K

Samira Daruki @SamiraDaruki

9 days ago

@poolio All the best with the next journey @poolio! And We should catch up some time :)

395

SamiraDaruki retweeted

Nicole Hao

@NicoleHao256201

12 days ago

Hear me out on a SF coffee chat idea: Do a coffee chat with me, but whoever brings up AI, LLMs, AGI, ASI, frontier models, GPUs, semiconductors, NVIDIA, OpenAI, Anthropic, DeepMind, Meta AI, xAI, Claude, ChatGPT, Gemini, Grok, Llama, Qwen, DeepSeek, Mistral, agents, agentic workflows, prompts, prompt engineering, tokens, inference, CUDA, H100s, H200s, B200s, A100s, TPUs, Cerebras, Groq, ASICs, wafer-scale chips, TSMC, ASML, EUV, HBM, DDR5, data centers, servers, power constraints, nuclear energy for AI, cloud computing, Kubernetes, Docker, vector DBs, embeddings, RAG, context windows, long-context models, Transformers, state space models, Mamba, MoE, sparse attention, KV cache, quantization, distillation, RLHF, RLAIF, DPO, PPO, constitutional AI, synthetic data, data flywheels, post-training, pre-training, test-time compute, chain-of-thought, reasoning models, o1, o3, evals, benchmarks, SWE-bench, MMLU, GPQA, ARC-AGI, Humanity’s Last Exam, multimodality, VLMs, diffusion models, world models, robotics foundation models, autonomous driving, self-play, tool use, computer use, browser agents, AI coding, Cursor, Copilot, Codex, Devin, Claude Code, AI video generation, Sora, Veo, Runway, Midjourney, Stable Diffusion, scaling laws, Bitter Lesson, interpretability, mechanistic interpretability, alignment, AI safety, model collapse, hallucination, jailbreaks, prompt injection, AI wrappers, AI-native SaaS, AI replacing PMs, AI replacing engineers, AI fundraising, foundation model economics, inference margins, AI capex, NVIDIA market cap, tech stocks, IPOs, Silicon Valley, SF founder mode, SpaceX, data centers on the moon or Mars, Elon Musk, Jensen Huang, or Sam Altman… buys coffee for the other person. I don’t think it’s possible to make it through the whole conversation without saying any of these in the Bay Area, but worth experimenting

154

21K

Who to follow

Claire Hughes Johnson

@chughesjohnson

Author of Scaling People. Work @stripe. Board member, mom, wife and busy friend. Never enough time to read all on my list. Pretend to garden. Roughly that!

Emma Brunskill

@EmmaBrunskill

Associate professor, Computer Science. Stanford. Stanford's Human Centered AI (HAI) Institute. Opinions expressed are my own.

Michelle Lee

@michellearning

Founder & CEO of @medra_ai. Let's eradicate disease. Prev. phd @StanfordAILab, robotics @nvidia, propulsion @SpaceX. Loosely hinged.

Samira Daruki @SamiraDaruki

17 days ago

@NoamShazeer Many Congrats Noam! Discussion sessions with you were a major highlight of the time on Gemini pretraining.

249

SamiraDaruki retweeted

Kefan XIAO @KevinKiao

17 days ago

Crazy!

Samira Daruki @SamiraDaruki

17 days ago

@dustinvtran Congrats @dustinvtran ! Super excited about what you and the Elorian team will build in MultiModal Reasoning space!

412

SamiraDaruki retweeted

Nikita Bier

@nikitabier

20 days ago

Maybe it wasn’t a good idea to get a hotel in Times Square, during the World Cup and the NBA Finals.

705

310

258

432K

SamiraDaruki retweeted

Barstool Sports

@barstoolsports

20 days ago

EVERYONE IN NYC IS SINGING EMPIRE STATE OF MIND

785

161K

18K

Samira Daruki @SamiraDaruki

21 days ago

@_arohan_ At this point, we need to write a comprehensive guideline book on training spikes: different types, root causes, and mitigation methods for each..

118

Samira Daruki @SamiraDaruki

about 1 month ago

@RohunSaxena @eliebakouch Congrats @RohunSaxena ! Really nice technical report :)

Samira Daruki @SamiraDaruki

about 1 month ago

@iamgrigorev + a bit science of scaling?

188

SamiraDaruki retweeted

koray kavukcuoglu

@koraykv

about 2 months ago

Today at Google I/O, we introduced Gemini 3.5 Flash! It has become an integral part of our daily research cycle and works with all the tools we have at Google. We used a team of agents in Antigravity 2.0 to recreate the original AlphaZero research paper and build a playable version. They coded the reinforcement learning pipeline in JAX/Flax, trained a ResNet model from scratch via self-play on multi-TPU pods, and shipped a full-stack web app so you can play against it, from just 2 prompts. . Here’s what else makes 3.5 Flash special 🧵

116

567

123

98K

SamiraDaruki retweeted

Pushmeet Kohli

@pushmeet

about 2 months ago

The results of the research happening in my team @GoogleDeepMind have convinced me that the next era of scientific discovery will be aided by AI agents acting as force multipliers for human ingenuity. That’s why I’m proud to introduce Gemini for Science - a collection of experimental science tools designed to support researchers at every stage of the research process. The tools include: 1️⃣ Literature Insights, built with Google NotebookLM, searches millions of scientific papers to synthesize findings and generate artifacts including data tables, slides, reports, and more. 2️⃣ Hypothesis Generation, built with Co-Scientist, simulates the scientific method via a multi-agent "idea tournament" to generate, debate, and rigorously evaluate research hypotheses. 3️⃣Computational Discovery, built with AlphaEvolve and ERA, is an agentic engine that generates and scores thousands of code variations in parallel, allowing researchers to test modeling approaches in fields like epidemiology in a fraction of the usual time. Read more: https://t.co/l8XIg8iXCN Register for access here: https://t.co/V3YS15mRUS

248

680

159K

SamiraDaruki retweeted

Oriol Vinyals

@OriolVinyalsML

about 2 months ago

1/ Today at Google I/O, we’re launching Gemini 3.5 Flash ⚡️⚡️⚡️! Our mission was clear: bring frontier-level intelligence with unprecedented speed. 3.5 Flash delivers drastic intelligence (beating 3.1 Pro on almost every benchmark), at Flash speeds. 🧵

OriolVinyalsML's tweet photo. 1/ Today at Google I/O, we’re launching Gemini 3.5 Flash ⚡️⚡️⚡️!

Our mission was clear: bring frontier-level intelligence with unprecedented speed.

3.5 Flash delivers drastic intelligence (beating 3.1 Pro on almost every benchmark), at Flash speeds. 🧵 https://t.co/oCtQ0zEhOK

126

11K

SamiraDaruki retweeted

🇺🇦 Dzmitry Bahdanau @DBahdanau

about 2 months ago

in 3-4 years companies will be hiring INSANELY expensive consultants to unscrew their Mythos-created spaghetti critical infrastructure, which was 99.9% autonomous, until the 0.1% catastrophy hit don't underestimate humans. We are amazing!

198

48K

SamiraDaruki retweeted

finbarr

@finbarrtimbers

about 2 months ago

There’s so many very reasonable, logical critiques of LLMs that sound totally plausible until you get Opus to debug a nasty ML problem

343

127K

SamiraDaruki retweeted

Demis Hassabis

@demishassabis

about 2 months ago

We've come a long way in that 10 years. Korean billboards in 2016 vs 2026

699

58K

Samira Daruki @SamiraDaruki

about 2 months ago

@archanfel_anoth All the best Juntang for the next chapter!

580

Samira Daruki @SamiraDaruki

2 months ago

@Yihe__Deng Wish you all the best with the next adventure Yihe!

358

SamiraDaruki retweeted

Dwarkesh Patel

@dwarkesh_sp

2 months ago

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

152

592

Samira Daruki

@SamiraDaruki

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users