Check out our new paper “Human-Timescale Adaptation in an Open-Ended Task Space” 📄
https://t.co/njE3pkMheM
So happy to share what we’ve been up to the last year. I really loved working on this project to create adaptive agents with these amazing collaborators💛
I’m super excited to share our work on AdA: An Adaptive Agent capable of hypothesis-driven exploration which solves challenging unseen tasks with just a handful of experience, at a similar timescale to humans.
https://t.co/2r0plRI7mN
See the thread for more details 👇 [1/N]
Really proud to work with this team of awesome people on the mission to create a tutor for every learner and an assistant for every teacher! All done with participation and responsibility at its core.
We’re sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life. 🤝
Watch it in action, with two parts - each was captured in a single take, in real time. ↓ #GoogleIO
Google presents Genie
Generative Interactive Environments
introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
I am really excited to reveal what @GoogleDeepMind's Open Endedness Team has been up to 🚀. We introduce Genie 🧞, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.
We have a long history of supporting responsible open source & science, which can drive rapid research progress, so we’re proud to release Gemma: a set of lightweight open models, best-in-class for their size, inspired by the same tech used for Gemini https://t.co/0aVehuXila
Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length
Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long context capabilities, supporting millions of tokens of multimodal input. The multimodal capabilities of the model means you can interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more.
Gemini 1.5 was built by an amazing team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google. @OriolVinyals (my co-technical lead for the project) and I are incredibly proud of the whole team, and we’re so excited to be sharing this work and what long context and in-context learning can mean for you today!
There’s lots of material about this, some of which are linked to below.
Main blog post:
https://t.co/QAsDKXBdao
Technical report:
“Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context”
https://t.co/CTzTHNDCdo
Videos of interactions with the model that highlight its long context abilities:
Understanding the three.js codebase: https://t.co/yq7d6OSD6c
Analyzing a 45 minute Buster Keaton movie: https://t.co/adyMgDYHoK
Apollo 11 transcript interaction: https://t.co/Pqvq3Eac1R
Starting today, we’re offering a limited preview of 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI. Read more about this on these blogs:
Google for Developers blog:
https://t.co/x73Vun0kVS
Google Cloud blog:
https://t.co/OlaTW6PYGn
We’ll also introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model.
Early testers can try the 1 million token context window at no cost during the testing period. We’re excited to see what developer’s creativity unlocks with a very long context window.
Let me walk you through the capabilities of the model and what I’m excited about!
Everyone from @GoogleDeepMind's Open-Endedness Team and almost the entire @UCL_DARK Lab are going to be at @NeurIPSConf 2023 next week. You will find most of us at the @ALOEworkshop on Friday. Come and say hi!
https://t.co/gqzKHLcIAw
Exciting times, welcome Gemini (and MMLU>90)! State-of-the-art on 30 out of 32 benchmarks across text, coding, audio, images, and video, with a single model 🤯
Co-leading Gemini has been my most exciting endeavor, fueled by a very ambitious goal. And that is just the beginning! A long 🐍 post about our Gemini journey & state of the field.
The biggest challenges in LLMs are far from trivial or obvious. Evaluation and data stand out to me. We've moved beyond the simpler "Have we won in Go/Chess/StarCraft?" to “Is this answer accurate and fair? Is this conversation good? Does this complex piece of text prove the theorem?” Exciting potential coupled with monumental challenges.
The field is less ripe further down the model pipeline. Pretraining is relatively well understood. Instruction tuning and RLHF, less so. In AlphaGo and AlphaStar we spent 5% of compute in pre-training and the rest in the very important RL phase, where the model learns from its successes or failures. In LLMs, we spend most of our time on pretraining. I believe there’s huge potential to be untapped. Cakes with lots of cherries, please 🎂
@Google has demonstrated its ability to move fast. It has been an absolute blast to see the energy from my colleagues and the support received. A “random” highlight is coauthoring our tech report with a co-founder. Another is coleading with @JeffDean. But beyond individuals, Gemini is about teamwork: it is important to recognize the collective effort behind such achievements. Picture a room full of brilliant people, and avoid attributing success solely to one person.
On a personal note, recently I celebrated my 10 year anniversary at Google, and it’s been 8 years since @quocleix and I co-authored “A Neural Conversational Model”, which gave us a glimpse of what was, has, and is yet to come. Back then, that line of work received a lot of skepticism. Lessons learned: whatever your passion is, push for it!
Zooming back out, there’s lots of change in our field, and the stakes couldn’t be higher. Excited for what’s to come from Gemini, but humbled by the responsibility to “get it right”. 2024 will be drastic. Welcome Gemini!
https://t.co/X4ZLpcUiiL
Gemini 1.0 is here!
Very grateful to have been one small part of the massive team working towards advancing the state of the art for generally capable models.
Here's to a bright future, to continuing to advance intelligence, in the quest towards AGI.
https://t.co/kq3MzKdHqB
We’re excited to announce 𝗚𝗲𝗺𝗶𝗻𝗶: @Google’s largest and most capable AI model.
Built to be natively multimodal, it can understand and operate across text, code, audio, image and video - and achieves state-of-the-art performance across many tasks. 🧵 https://t.co/mwHZTDTBuG
Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano
Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks. With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU.
https://t.co/yCsjfQKO9F
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses https://t.co/VUu1277bC2
I'm super excited to share AlphaZeroᵈᵇ, a team of diverse #AlphaZero agents that collaborate to solve #Chess puzzles and demonstrate increased creativity. Check out our paper to learn more!
https://t.co/NveOuYmF6y
A quick 🧵(1/n)
Introducing AdA: an AI agent that can adapt to solve new problems as quickly as humans.💡
In a 3D virtual world, it learned to solve tasks within minutes such as combining objects in novel ways and cooperating with others.
Find out more at #ICML2023:
📍 Room 313
⌚ 17.30 HST