David Page

@dcpage3

Machine learning researcher

Joined April 2018

1K Following

2.4K Followers

181 Posts

Pinned Tweet

David Page @dcpage3

almost 7 years ago

Ever wanted to train CIFAR10 to 94% in 26 SECONDS on a single-GPU?! In the final post of our ResNet series, we open a bag of tricks and drive training time ever closer to zero... Colab: https://t.co/GwNFQAmFT7 Blog: https://t.co/5PcluNHXa2

dcpage3's tweet photo. Ever wanted to train CIFAR10 to 94% in 26 SECONDS on a single-GPU?!

In the final post of our ResNet series, we open a bag of tricks and drive training time ever closer to zero...

Colab: https://t.co/GwNFQAmFT7

Blog: https://t.co/5PcluNHXa2 https://t.co/3C0DKV3AAP

365

266

dcpage3 retweeted

Walter Goodwin

@goodwin_ml

24 days ago

I am thrilled to share the news that Fractile's mission to build chips and systems that unlock the next generation of AI scaling has been bolstered, with a $220M funding round led by Accel, Factorial Funds, and Founders Fund, alongside some incredible backers old and new. AI inference is driving the defining infrastructure buildout of the 21st Century. We've written a bit about where we think capabilities must go, and how Fractile is working to bring this about: https://t.co/L3VTjJjLuL It has been a privilege working on one of the hardest but most rewarding technical challenges of our time for over three years, with the most brilliant, kind and driven people I could have ever hoped to work alongside. We are still just getting started. There is a lot to be done to deliver on our goals, but we are grateful to have the support of so many people in chasing these down every minute of every day. Thanks, all, for being part of the Fractile mission! 🚀 @fractile_ai

$goodwin_ml's tweet photo. I am thrilled to share the news that Fractile's mission to build chips and systems that unlock the next generation of AI scaling has been bolstered, with a $220M funding round led by Accel, Factorial Funds, and Founders Fund, alongside some incredible backers old and new. AI inference is driving the defining infrastructure buildout of the 21st Century. We've written a bit about where we think capabilities must go, and how Fractile is working to bring this about: https://t.co/L3VTjJjLuL It has been a privilege working on one of the hardest but most rewarding technical challenges of our time for over three years, with the most brilliant, kind and driven people I could have ever hoped to work alongside. We are still just getting started. There is a lot to be done to deliver on our goals, but we are grateful to have the support of so many people in chasing these down every minute of every day. Thanks, all, for being part of the Fractile mission! 🚀 @fractile_ai$

178

104K

David Page @dcpage3

5 months ago

@badlogicgames My git repo 🙈 https://t.co/3TjDXuLwt2

David Page @dcpage3

about 2 years ago

@jeremyphoward @hi_tysam @kellerjordan0 @karpathy Impressive, well done! Hi Jeremy! Still lurk here occasionally and hopefully start blogging again soon..

207

Who to follow

Vivek Ramanujan

@RamanujanVivek

PhD student at the University of Washington with Ali Farhadi. Previously a researcher at Allen Institute for Artificial Intelligence.

Aman Arora

@amaarora

Building | Learning | Sharing | Previously Lead AI Engineer at RelevanceAI, AI Engineer @ Weights&Biases

Saurabh Garg

@saurabh_garg67

@thinkymachines | prev/ Researcher @MistralAI; PhD @mldcmu; CS @iitbombay (undergrad);

David Page @dcpage3

about 4 years ago

@samgd @jeremyphoward @nanopore Yep not me this time! Looks like fascinating work from Giuseppe though

David Page @dcpage3

about 5 years ago

@JoaquinAlori Thank you!

David Page @dcpage3

over 6 years ago

The paper that introduced Batch Norm https://t.co/vkT0LioKHc combines clear intuition with compelling experiments (14x speedup on ImageNet!!) So why has 'internal covariate shift' remained controversial to this day? Thread 👇

dcpage3's tweet photo. The paper that introduced Batch Norm https://t.co/vkT0LioKHc combines clear intuition with compelling experiments (14x speedup on ImageNet!!)

So why has 'internal covariate shift' remained controversial to this day?

Thread 👇 https://t.co/L0BBmo0q4t

313

323

dcpage3 retweeted

Zeyuan Allen-Zhu, Sc.D.

@ZeyuanAllenZhu

over 5 years ago

Excited to announce our new work, a unified theory towards explaining 3 black magics in deep learning: (1) ensemble, (2) knowledge distillation, and (3) self-distillation. An accessible blog post is below.

258

David Page @dcpage3

over 5 years ago

@bozavlado @iiSeymour CTC_CRF extends flipflop to output scores for multiple (six) consecutive bases not just two. Output layer is mostly orthogonal to choice of RNN/CNN encoder so CNN improvements are very welcome! More details coming soon..

dcpage3 retweeted

Chris Seymour

@iiSeymour

over 5 years ago

Big accuracy update coming in the next version of Bonito 🚀 v0.3.0 combines everything we have learned with structured and unstructured approaches - @dcpage3, Tim and myself are working hard on the finished touches this week - watch this space 👀

David Page @dcpage3

over 5 years ago

@TMVector @nanopore Thanks Jonny, that’s very kind!

David Page @dcpage3

over 5 years ago

First day of new job @nanopore where I get to apply ML to a bunch of fun science and engineering problems. Pretty excited!

David Page @dcpage3

over 5 years ago

@achacond @nanopore Thanks, looking forward to it!

David Page @dcpage3

over 5 years ago

@Sisseljuul @nanopore Thank you!

dcpage3 retweeted

Chris Seymour

@iiSeymour

almost 6 years ago

Bonito v0.2.2 - SAM output - Sequence and alignment tsv summaries - Grab bag of training improvements from @dcpage3 https://t.co/aBTzr5RV93

dcpage3 retweeted

Alex Thiery @alexxthiery

almost 6 years ago

Preparing a short course on neural nets can be fun. Below is one of the fast Resnets by @dcpage3 on CIFAR10. Would have been nice to track a UMAP-like representation of some internal layer, but have not found a reasonably fast/stable way to do so. Any idea? @NikolayOskolkov

David Page @dcpage3

about 6 years ago

Undertraining a large model is a good way to speed things up on toy problems https://t.co/Bu9Np93B7t but it was far from clear this should extend to large scale.

dcpage3's tweet photo. Undertraining a large model is a good way to speed things up on toy problems https://t.co/Bu9Np93B7t but it was far from clear this should extend to large scale. https://t.co/9kOgnxnBLB

David Page @dcpage3

about 6 years ago

Great study of training efficiency at large scale + nice results on compression for inference!

Eric Wallace

@Eric_Wallace_

over 6 years ago

Not everyone can afford to train huge neural models. So, we typically *reduce* model size to train/test faster. However, you should actually *increase* model size to speed up training and inference for transformers. Why? [1/6] 👇 https://t.co/GcjytCEmox https://t.co/HatYO5GfhP

Eric_Wallace_'s tweet photo. Not everyone can afford to train huge neural models. So, we typically *reduce* model size to train/test faster.

However, you should actually *increase* model size to speed up training and inference for transformers.
Why? [1/6] 👇

https://t.co/GcjytCEmox
https://t.co/HatYO5GfhP https://t.co/ivKyNo1ve0

346

231

David Page @dcpage3

over 6 years ago

Simple setup + attention to details -> sota self-supervised reps! LARS -> large batches -> no need for memory bank of -ve examples Random crops + color aug (to prevent hist cheating) -> no need for special arch Projn head for contrastive loss -> hidden reps preserve info

Ting Chen @tingchenai

over 6 years ago

Introducing SimCLR: a Simple framework for Contrastive Learning of Representations. SimCLR advances previous SOTA in self-supervised and semi-supervised learning on ImageNet by 7-10% (see next). https://t.co/X5CXud0VwL Joint work with @skornblith @mo_norouzi @geoffreyhinton.

tingchenai's tweet photo. Introducing SimCLR: a Simple framework for Contrastive Learning of Representations. SimCLR advances previous SOTA in self-supervised and semi-supervised learning on ImageNet by 7-10% (see next).

https://t.co/X5CXud0VwL

Joint work with @skornblith @mo_norouzi @geoffreyhinton. https://t.co/9nJ6sod91a

967

292

197

dcpage3 retweeted

Chris Seymour

@iiSeymour

over 6 years ago

Blitzing fast CTC decoding https://t.co/gwiJhrBSpu

David Page @dcpage3

over 6 years ago

@Buntworthy It's fixed now. Thanks for letting me know!

dcpage3 retweeted

Jeremy Howard

@jeremyphoward

over 6 years ago

@ylecun @viglovikov @timetravellertt @kaggle The problem though with "you can always add those tricks to get the numbers up" is that *very* often I see papers that don't do data aug, or don't tune hyper-params, etc, then claim their new idea helps. But then I find it's actually just a poor proxy for the things they skipped

David Page

@dcpage3

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users