Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params!
https://t.co/FDyBXyLzAV
My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!
The Training team @OpenAI is hiring researchers in London 🚀
Our twin missions are to train better LLMs, and serve them more cheaply
Get in touch if you are excited to collaborate on architecture design, reliable scaling, and faster optimization
Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6
We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs.
It's surprising how much one can delve into, and how beautiful it can become.
With (and only thanks to) the amazing Alexandre and @BachFrancis
https://t.co/z7reli3BpY
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:
→ LLMs are limited in creativity since they learn to predict the next token
→ creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds!
🚀🚀🚀
Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec, including overheads like tokenization, prefill, safety filters etc.
Excited to share that our paper "Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero" is now out in PNAS!
With @weballergy, @banburismus_, @demishassabis, @ulrichpaquet, @_beenkim 🎉
📄 https://t.co/WTEPob2Q2Y
Our new paper sheds light on the process of knowledge acquisition in language models, with implications for
- data curricula
- the challenges of learning new knowledge when fine-tuning
- the emergence of hallucinations.
Nicolas did a great job on the project! See his thread👇
Large language models store vast amounts of knowledge, but how exactly do they learn it?
Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs!
https://t.co/WhuJ4atTc6
Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit.
Available freely to developers and businesses, it will help them identify their AI-generated content. 🔍
Find out more → https://t.co/n2aYoeJXqn
Great contribution from Meta to the research community with a very easy-to-read codebase for LLM development: https://t.co/2astnovtY4
@sohamde_ and @SamuelMLSmith have implemented Hawk as well, which seems to have a performance comparable to Mamba.
We have an opening for a PhD intern working closely with (among others) me, Arwen Bradley, David Berthelot, on scientific aspects of diffusion & generative models. 1/
We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬
It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧵 https://t.co/lx35RvplFr
@champydaku The data efficiency comes primarily due to better tuning. We did a lot of work to establish hyperparameter scaling rules for Griffin so we can scale efficiently - we might write this up at some point.
We compare diff capabilities in the Griffin paper: https://t.co/FDyBXyLzAV
Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens!
I finally updated arXiv with some of our results: https://t.co/OACi24CT7w
Link to weights and code for our models in thread!
A new blog post talking about Gemma architecture explained!
This time is RecurrentGemma: https://t.co/vntCr8gWOf
This is the Gemma model that is not based in the Transformers architecture but on Recurrent Neural Network!
Is this the return of RNNs?
#gemmaverse
Both pre-trained and instruction-tuned models are here:
https://t.co/pFKM9ApOaC
https://t.co/i83bpKnra3
Code here: https://t.co/VskJw7eo9Z
And ofc, we have our 2B version of RecurrentGemma as well, released earlier this year!
https://t.co/Rua9VXfmSc
https://t.co/vmGC0aMEy5
Are small models still undertrained?
We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model.
Distillation is the future of LLMs with the growing availability of large and efficient open models!
It was fun to moderate this discussion with a great group of panelists. Lots of interesting points made on how to approach the next gen of seq modelling architectures. Thanks for the invite @caglarml@orvieto_antonio Razvan and others!
I am absolutely thrilled to announce the release of Gemma 2! Today, we're releasing both pre-trained-only and fully post-trained 9B and 27B models. The full technical report is here: https://t.co/QIYalQ3jaB and it's live *right now* on https://t.co/XoiJYticj3.
Welcome RecurrentGemma 9B 🔥
> Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚡
> Base (9B) and Instruct (9B-IT) models released.
> MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further.
> Based on the Griffin Architecture
> Achieves faster inference with long sequences by replacing gloabal attention with local and linear recurrences.
> Available in Transformers! 🤗
Massive Kudos to Google for continue open research for alternative architectures! GG!