Super excited to start as an AI Research Intern at @IFM_MBZUAI in Sunnyvale next week (and to get back to the Bay after 7 years)!
Working with @ssahoo_ and lots of other great people, hope to help train some large open-source models!
(If you're in the Bay hit me up!)
Our paper "๐ฆ๐๐ผ๐ฝ ๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐ณ๐ผ๐ฟ ๐๐ต๐ฒ ๐ช๐ผ๐ฟ๐๐: ๐ฃ๐ฟ๐ผ๐ด๐ฟ๐ฒ๐๐๐ถ๐๐ฒ ๐จ๐ป๐บ๐ฎ๐๐ธ๐ถ๐ป๐ด ๐๐ฐ๐ฐ๐ฒ๐น๐ฒ๐ฟ๐ฎ๐๐ฒ๐ ๐ ๐ฎ๐๐ธ๐ฒ๐ฑ ๐๐ถ๐ณ๐ณ๐๐๐ถ๐ผ๐ป ๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด" got accepted to ICML! See you in Seoul ๐ฅณ ๐ฐ๐ท
More๐
๐ ๐ฆ๐๐ป, ๐๐ฝ๐ฟ ๐ฎ๐ฒ (๐ ๐ ๐๐ป๐๐ฒ๐น๐น๐ถ๐ด๐ฒ๐ป๐ฐ๐ฒ, ๐ฅ๐ผ๐ผ๐บ ๐ฎ๐ฌ๐ฐ๐): ๐ ๐ฃ๐จ๐ ๐: We show that adapting the forward pass of MDMs at training time to the inference time trajectories can speed up pretraining by up to 2.3x. w/ @Jaeyeon_Kim_0@elmelis@ShamKakade6@sitanch
I'm super excited to present three of our papers at ICLR in Rio! ๐ง๐ท ๐๏ธ
If you're around, come say hi, or let me know if you'd like to grab a coffee :) (We'd have to try find one that's not entirely made of sugar though, still searching).
Posters in the thread๐
๐ ๐๐ฟ๐ถ, ๐๐ฝ๐ฟ ๐ฎ๐ฐ, ๐ฏ.๐ญ๐ฑ๐ฝ๐บ (๐ฃ๐ฏ-#๐ด๐ญ๐ด): ๐ช ๐๐ผ๐ผ๐บ๐ฒ๐ฟ๐ฎ๐ป๐ด ๐๐ถ๐๐๐ถ๐น๐น๐ฎ๐๐ถ๐ผ๐ป: You can now create a family of LLMs without any additional training from a single student-teacher pair! w/ @SaraKangaslahti@nihalcanrun@marco_fumero@FrancescoLocat8@elmelis
Our Data-Centric ML group is at ICLR ๐ง๐ทthis week. I couldn't make it this year ๐ฐ, but @SaraKangaslahti, @JonathanGeuter, @rach_it_ are there. Find them, say hi. Quick rundown ๐
๐จ๐จ๐จ Now you can stop training your masked diffusion models ''for the worst''.
We propose ๐PUMA๐--Progressive UnMAsking, a simple modification of the forward masking process that speeds up the masked diffusion training.
@Uber@Uber trying to reach out here, as your customer support keeps ending chats and is not of much help -- I got charged for an Uber Eats order that contained the wrong items, and I'm trying to get a refund, but your customer support refuses. Can you please help out? Thanks
๐ New preprint alert:
We study ๐ชBoomerang Distillation๐ช, a surprising phenomenon that allows generating a family of pre-trained LLMs of intermediate sizes from a single teacherโstudent pair โ ๐ง๐จ ๐๐ฑ๐ญ๐ซ๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐ซ๐๐ช๐ฎ๐ข๐ซ๐๐!
๐งต๐
If you're at #AISTATS2025, check out the presentation by Jonathan Geuter, in collaboration with @Clement_Bonet_ , @Korba_Anna and @elmelis.
'DDEQs: Distributional Deep Equilibrium Models through Wasserstein Gradient Flows'
https://t.co/Aac1kwe2lC
#AI#statistics#ML