Soham De @sohamde_ - Twitter Profile

Pinned Tweet

over 2 years ago

Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! https://t.co/FDyBXyLzAV My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!

12

304

64

188

49K

sohamde_ retweeted

Samuel L Smith @SamuelMLSmith

8 months ago

The Training team @OpenAI is hiring researchers in London 🚀 Our twin missions are to train better LLMs, and serve them more cheaply Get in touch if you are excited to collaborate on architecture design, reliable scaling, and faster optimization

11

490

38

242

91K

sohamde_ retweeted

Jun Cheng @s6juncheng

12 months ago

Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6

s6juncheng's tweet photo. Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6 https://t.co/OuNIbAsDoR

14

908

207

434

87K

sohamde_ retweeted

Antonio Orvieto

@orvieto_antonio

about 1 year ago

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and @BachFrancis https://t.co/z7reli3BpY

orvieto_antonio's tweet photo. We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs.

It's surprising how much one can delve into, and how beautiful it can become.

With (and only thanks to) the amazing Alexandre and @BachFrancis

https://t.co/z7reli3BpY https://t.co/658cL17VaL

2

169

42

82

11K

Who to follow

Pavel Izmailov

@Pavel_Izmailov

Researcher @AnthropicAI 🤖 Assistant Professor @nyuniversity 🏙️ Previously @OpenAI #StopWar 🇺🇦

Brandon Amos

@brandondamos

🧙 RL @Reflection_AI past: @MetaAi @GoogleDeepmind @SCSatCMU @Cornell_Tech

Jonathan Frankle

@jefrankle

Chief AI Scientist @databricks via MosaicML. e/brick

sohamde_ retweeted

Vaishnavh Nagarajan @_vaishnavh

about 1 year ago

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

_vaishnavh's tweet photo. 📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:

→ LLMs are limited in creativity since they learn to predict the next token

→ creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵 https://t.co/pnBlIlT39D

1

166

42

112

29K

sohamde_ retweeted

Brendan O'Donoghue

@bodonoghue85

about 1 year ago

Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds! 🚀🚀🚀 Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec, including overheads like tokenization, prefill, safety filters etc.

93

3K

250

1K

578K

sohamde_ retweeted

Lisa Schut @miouantoinette

about 1 year ago

Excited to share that our paper "Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero" is now out in PNAS! With @weballergy, @banburismus_, @demishassabis, @ulrichpaquet, @_beenkim 🎉 📄 https://t.co/WTEPob2Q2Y

16

435

72

268

101K

Soham De @sohamde_

about 1 year ago

Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his thread👇

Nicolas Zucchet @NicolasZucchet

about 1 year ago

Large language models store vast amounts of knowledge, but how exactly do they learn it? Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs! https://t.co/WhuJ4atTc6

NicolasZucchet's tweet photo. Large language models store vast amounts of knowledge, but how exactly do they learn it?

Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs!

https://t.co/WhuJ4atTc6 https://t.co/A5bN1XSxN6

6

178

33

134

24K

1

36

6

12

4K

sohamde_ retweeted

Google DeepMind @GoogleDeepMind

over 1 year ago

Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit. Available freely to developers and businesses, it will help them identify their AI-generated content. 🔍 Find out more → https://t.co/n2aYoeJXqn

27

940

210

392

408K

sohamde_ retweeted

Caglar Gulcehre

@caglarml

over 1 year ago

Great contribution from Meta to the research community with a very easy-to-read codebase for LLM development: https://t.co/2astnovtY4 @sohamde_ and @SamuelMLSmith have implemented Hawk as well, which seems to have a performance comparable to Mamba.

2

132

19

77

14K

sohamde_ retweeted

Preetum Nakkiran @PreetumNakkiran

over 1 year ago

We have an opening for a PhD intern working closely with (among others) me, Arwen Bradley, David Berthelot, on scientific aspects of diffusion & generative models. 1/

4

205

37

199

48K

sohamde_ retweeted

Google DeepMind @GoogleDeepMind

almost 2 years ago

We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬 It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧵 https://t.co/lx35RvplFr

66

3K

787

651

1M

Soham De @sohamde_

almost 2 years ago

@champydaku The data efficiency comes primarily due to better tuning. We did a lot of work to establish hyperparameter scaling rules for Griffin so we can scale efficiently - we might write this up at some point. We compare diff capabilities in the Griffin paper: https://t.co/FDyBXyLzAV

0

2

0

85

Soham De @sohamde_

almost 2 years ago

Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens! I finally updated arXiv with some of our results: https://t.co/OACi24CT7w Link to weights and code for our models in thread!

sohamde_'s tweet photo. Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens!

I finally updated arXiv with some of our results: https://t.co/OACi24CT7w

Link to weights and code for our models in thread! https://t.co/fKYhvEtXBv

5

225

30

102

24K

sohamde_ retweeted

Gus (🤖🧠+🐍+🥑🗣️) @gusthema

almost 2 years ago

A new blog post talking about Gemma architecture explained! This time is RecurrentGemma: https://t.co/vntCr8gWOf This is the Gemma model that is not based in the Transformers architecture but on Recurrent Neural Network! Is this the return of RNNs? #gemmaverse

0

14

2

9

1K

Soham De @sohamde_

almost 2 years ago

Both pre-trained and instruction-tuned models are here: https://t.co/pFKM9ApOaC https://t.co/i83bpKnra3 Code here: https://t.co/VskJw7eo9Z And ofc, we have our 2B version of RecurrentGemma as well, released earlier this year! https://t.co/Rua9VXfmSc https://t.co/vmGC0aMEy5

0

7

3

1

725

sohamde_ retweeted

Armand Joulin @armandjoulin

almost 2 years ago

Are small models still undertrained? We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. Distillation is the future of LLMs with the growing availability of large and efficient open models!

armandjoulin's tweet photo. Are small models still undertrained?
We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model.
Distillation is the future of LLMs with the growing availability of large and efficient open models! https://t.co/Qepta1kN7K

10

366

39

96

63K

Soham De @sohamde_

almost 2 years ago

It was fun to moderate this discussion with a great group of panelists. Lots of interesting points made on how to approach the next gen of seq modelling architectures. Thanks for the invite @caglarml @orvieto_antonio Razvan and others!

Caglar Gulcehre

@caglarml

almost 2 years ago

The panel discussion at NGSM workshop going on full steam ahead with a great line of panelists moderated by @sohamde_ ...

0

11

0

2K

0

12

0

1

1K

sohamde_ retweeted

Caglar Gulcehre

@caglarml

almost 2 years ago

@sohamde_ is presenting on SSM architectures and RNNs in NGSM workshop at Strauss 3 #ICML2024.

0

9

1

425

sohamde_ retweeted

Surya Bhupatiraju @suryabhupa

almost 2 years ago

I am absolutely thrilled to announce the release of Gemma 2! Today, we're releasing both pre-trained-only and fully post-trained 9B and 27B models. The full technical report is here: https://t.co/QIYalQ3jaB and it's live *right now* on https://t.co/XoiJYticj3.

21

228

47

50

26K

sohamde_ retweeted

Vaibhav (VB) Srivastav

@reach_vb

about 2 years ago

Welcome RecurrentGemma 9B 🔥 > Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚡ > Base (9B) and Instruct (9B-IT) models released. > MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further. > Based on the Griffin Architecture > Achieves faster inference with long sequences by replacing gloabal attention with local and linear recurrences. > Available in Transformers! 🤗 Massive Kudos to Google for continue open research for alternative architectures! GG!

reach_vb's tweet photo. Welcome RecurrentGemma 9B 🔥

> Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚡
> Base (9B) and Instruct (9B-IT) models released.
> MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further.
> Based on the Griffin Architecture
> Achieves faster inference with long sequences by replacing gloabal attention with local and linear recurrences.

> Available in Transformers! 🤗

Massive Kudos to Google for continue open research for alternative architectures! GG!

8

211

45

111

38K

Soham De

@sohamde_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users