Stein's unbiased risk estimate (SURE) is an almost magical formula that enables the computation of the mean squared error of a denoiser (used, for example, in denoising score matching) using only the noisy observation y, without requiring the clean data x. https://t.co/NwmXkmRC15
🎓Kullback-Leibler divergence between densities of an exponential family = reverse Bregman divergence wrt the cumulant function
🎉Kullback-Leibler divergence between non-normalized densities = reverse Bregman divergence wrt the partition function
👉 https://t.co/bmN5jDvwza
New blog post! Some thoughts about diffusion distillation. Actually, quite a lot of thoughts 🤭 Please share your thoughts as well!
https://t.co/JZyRsjC25v
MICo: Improved representations via sampling-based
state similarity for MDPs
Our #NeurIPS2021 paper introduces a new loss that improves your RL agents!
📜Paper: https://t.co/MXN0t4ytha
💻Blog: https://t.co/pw3B1D8XpF
🐍Code: https://t.co/XbLefgQAhh
1/🧵
How to do research with my mentors effectively?
I get this question frequently in my open office hours. I am still learning as well but I hope sharing my ✌💰 may be helpful to some.
Key idea ➡️ **Help them help you!**
How? Check out the thread 🧵
Unsure which arch to use for your deep ensemble? Why settle for one? Neural Ensemble Search constructs ensembles with varying network archs
Paper: https://t.co/EImmbXUKD9
Code: https://t.co/1uhTwm0oWG
Work by @ShehZaidi@ZelaArber Thomas Elsken @cholmesuk@FrankRHutter@yeewhye
Sorry if somebody did this one before - but the field is growing so fast, there is no way I can keep track of it!
While making this I constantly felt like somebody is scooping me - or was I already scooped?
#ComputerVision#CVPR2021#TypesofPaper
Check out what we’ve been working on for the last months: We decouple the model size and the compute cost in a Vision Transformer backbone by using Sparse MoE layers. These have been popularised in NLP, and they are fantastic for Vision too! https://t.co/AdDwJKvkBk
[3/3] Towards big vision
While dense models are still the norm, sparse MoE layers can work well too!
Large Vision-MoEs (15B params) can be trained to high performance relatively efficiently, and can even prioritize amongst patches (see duck).
https://t.co/0WHyJTlGG5
...