Dan Busbridge @danbusbridge - Twitter Profile

Pinned Tweet

over 1 year ago

Reading "Distilling Knowledge in a Neural Network" left me fascinated and wondering: "If I want a small, capable model, should I distill from a more powerful model, or train from scratch?" Our distillation scaling law shows, well, it's complicated... 🧵 https://t.co/b1uuyJwzRF

10

1K

146

1K

123K

Dan Busbridge

@danbusbridge

5 days ago

Very excited to have spotted and photoed a Bittern at @WWTWelney today.

1

7

0

326

Dan Busbridge

@danbusbridge

11 months ago

Uncertainty methods and correctness metrics often share "mutual bias" (systematic errors from a common confounder like response length), skewing LLM evaluations. New paper from my colleagues shows that "LM-as-a-judge" evaluation is more robust and human-aligned. Important work - check it out! https://t.co/3ed0lFUQlK

Andrea Santilli @teelinsan

11 months ago

Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly? 🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed

teelinsan's tweet photo. Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly?

🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed https://t.co/wi59eGDLP7

1

48

17

12

4K

0

12

1

2

1K

Dan Busbridge

@danbusbridge

11 months ago

@jxbz @phillip_isola @tmjlarge @yangliux1 @minyoung_huh @hyojinbahng Awesome work @jxbz and Laker! Sent an email in case you're around for coffee, would love to discuss more.

0

1

0

124

Who to follow

gemini evals & post-training @GoogleDeepMind

Jason Ramapuram

@jramapuram

Research Scientist @ DeepMind | Formerly:  MLR, Qualcomm, Viasat, Rockwell Collins | Swiss-minted PhD in ML | Barista alumnus ☕ @ Starbucks | 🇺🇸🇮🇳🇱🇻🇮🇹

Dan Busbridge

@danbusbridge

11 months ago

Happing now in East Exhibition Hall E-2310, with @AmitisShidani1, looking forward to discussing our work!

1

10

1

0

640

Dan Busbridge

@danbusbridge

11 months ago

Data mixtures are crucial for achieving strong pre-trained models. Loved collaborating on this project led by @PierreAblin and @MustafaShukor1 tackling data mixing ratios through the lens of scaling laws. Check out @MustafaShukor1's 🧵.

Mustafa Shukor @MustafaShukor1

11 months ago

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

MustafaShukor1's tweet photo. We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders !

Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵 https://t.co/ISSAo9Ymp2

6

264

45

214

31K

1

18

2

4

2K

Dan Busbridge

@danbusbridge

11 months ago

@feijianghan Great to hear it was useful, thanks @feijianghan!

0

1

0

86

Dan Busbridge

@danbusbridge

11 months ago

@mciccone_AI Thanks @mciccone_AI !

0

1

0

52

Dan Busbridge

@danbusbridge

11 months ago

@DrZeeshanZia Thanks for coming @DrZeeshanZia, great to hear the talk was useful!

0

69

Dan Busbridge

@danbusbridge

11 months ago

Happening in 30 minutes in West Ballroom A - looking forward to sharing our work on Distillation Scaling Laws!

Dan Busbridge

@danbusbridge

11 months ago

Excited to be heading to Vancouver for #ICML2025 next week! I'll be giving a deep dive on Distillation Scaling Laws at the expo — exploring when and how small models can match the performance of large ones. 📍 Sunday, July 13, 5pm, West Ballroom A 🔗 https://t.co/yNd5eZByHR

3

27

4

7

12K

1

101

7

45

11K

Dan Busbridge

@danbusbridge

11 months ago

@AmitisShidani1 @samira_abnar @harshays_ @alaa_nouby @AggieInCA @LouisBAlgue @PierreAblin Here's an Apple@ICML guide with all our talks, posters, and booth events: 🔗 https://t.co/fEkTYVZIo1 Come say hi if you're around, always happy to chat. Looking forward to a week of great research, and catching up with familiar faces (and meeting new ones too).

0

3

1

2

404

Dan Busbridge

@danbusbridge

11 months ago

Excited to be heading to Vancouver for #ICML2025 next week! I'll be giving a deep dive on Distillation Scaling Laws at the expo — exploring when and how small models can match the performance of large ones. 📍 Sunday, July 13, 5pm, West Ballroom A 🔗 https://t.co/yNd5eZByHR

3

27

4

7

12K

Dan Busbridge

@danbusbridge

11 months ago

@AmitisShidani1 @samira_abnar @harshays_ @alaa_nouby @AggieInCA and Scaling Laws for Forgetting and Fine-Tuning (E-2708) with @LouisBAlgue, David Grangier, Eleonora Gualdoni, Marco Cuturi, and @PierreAblin 🔗 https://t.co/c8xqFTf3ZE

1

3

1

0

363

danbusbridge retweeted

Jason Ramapuram @jramapuram

about 1 year ago

Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention! We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens): - Sigmoid: gs://axlearn-public/experiments/gala-7B-sigmoid-hybridnorm-alibi-sprp-2024-12-03-1002/checkpoints/ - Softmax: gs://axlearn-public/experiments/gala-7B-hybridnorm-alibi-sprp-2024-12-02-1445/checkpoints/ Inference code at https://t.co/b0cp49qvAv

jramapuram's tweet photo. Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention!

We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens):

- Sigmoid: gs://axlearn-public/experiments/gala-7B-sigmoid-hybridnorm-alibi-sprp-2024-12-03-1002/checkpoints/
- Softmax: gs://axlearn-public/experiments/gala-7B-hybridnorm-alibi-sprp-2024-12-02-1445/checkpoints/

Inference code at https://t.co/b0cp49qvAv

1

45

14

18

10K

Dan Busbridge

@danbusbridge

about 1 year ago

I’ve been curious about how early vs late-fusion multimodal approaches compare in controlled conditions. Great to see this studied in depth. Turns out, optimal late fusion has higher params-to-data, and performance between early and late fusion is similar. Brilliant work from @MustafaShukor1 and team! Check it out: https://t.co/ySGPN5Ss19

Mustafa Shukor @MustafaShukor1

about 1 year ago

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

MustafaShukor1's tweet photo. We release a large scale study to answer the following:
- Is late fusion inherently better than early fusion for multimodal models?
- How do native multimodal models scale compared to LLMs.
- How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵 https://t.co/677ZM4kHbm

10

459

80

385

86K

1

40

7

17

4K

Dan Busbridge

@danbusbridge

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users