Scalability is a key factor limiting the use of Graph Neural Networks (GNNs) over large graphs; w/ @RWaleffe, @JasonMohoney , and Shiv, we introduce Marius++ (https://t.co/w4Hots8T7d), a system for *out-of-core* GNN mini-batch training over billion-scale graphs. (1/5)
Announcing a deadline extension for the ATTRIB workshop! Submissions are now due September 25th, with an option to submit October 4th if at least one paper author volunteers to be an emergency reviewer. More info here: https://t.co/a2TEiG1FLP
A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset:
* 7% attention, the rest is Mamba2
* MMLU jumps from 50 to 53.6%
* Training efficiency is the same
* Inference cost is much less
https://t.co/x62otbC5uN
Aurora is an AI foundation model that flexibly achieves SoTA 5-day air pollution, 10-day global weather, and other forecasts. More importantly, it shows evidence of good scaling properties and adaptation to new atmospheric tasks. Excited to see scale up. https://t.co/Swgbn0q6Sp
@BlackHC@RWaleffe@vmageirakos The methods are related but one key point in our work is that of adjusting the learning rate schedule. Also, our focus was on studying the true compute costs of prior "practical" methods. Yet, in theory selection can be better than full-dataset training: https://t.co/HAQpZrjHXF
Data pruning to reduce pertaining costs is hot, but fancy pruning can take just as long to select data as to train on all of it! Patrik, @Rwaleffe, and @vmageirakos's work at #ICLR2024 tomorrow shows how a simple, low-cost tweak to random sampling outperforms trendy methods!
Not convinced about using random sampling for data pruning? Consider twice! In our recent work, we introduce Repeated Sampling of Random Subsets: https://t.co/jk2dWHpocl, where we sample a subset of data at each epoch of training instead of only once at the beginning!
You can find all our comparisons against 30+ importance-based data pruning and selections methods at our paper: https://t.co/a9JnrBkKuI Turns out that sophisticated pruning might be a mirage for pre-training...
New job post looking for senior ML Engineers in Model Evaluation and Understanding. If you are at #ICLR2024, come talk to our group at the Microsoft Booth tomorrow at 9:30 am. Link for application: https://t.co/iYw4BBxAhz