Are curious about how to resize any pretrained model on demand?
Then you'll love our paper "FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment", accepted at ICML 2026 (Spotlight)
Joint work with @sam_hrvth, @stevelaskaridis, @mciccone_AI
🧵1/7
@gabriberton@georgiagkioxari@taiyasaki I'm looking forward to reading about (2): I can see valid points for/against this.
Another point is distinguishing closed-source from unreproducible articles, which is actually a broader concern.
@sam_hrvth@stevelaskaridis@mciccone_AI Results, finally!
FlexRank exhibits much more graceful degradation as the param budgets reduces.
Results on commonsense tasks of lm-eval-harness (LLMs) and on ImageNet-1K (ViTs).
More results and ablations in the paper 📑
📄 Project Page: https://t.co/NWeW1J8mg7
Are curious about how to resize any pretrained model on demand?
Then you'll love our paper "FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment", accepted at ICML 2026 (Spotlight)
Joint work with @sam_hrvth, @stevelaskaridis, @mciccone_AI
🧵1/7
@sam_hrvth@stevelaskaridis@mciccone_AI How does layerwise rank reduction translate into real inference savings?
Standard SVD adds parameter/FLOP overhead, requiring aggressive rank reduction to yield benefits.
We propose a novel factorization that guarantees savings at any budget, without aggressive rank reduction.
@sam_hrvth@stevelaskaridis@mciccone_AI 🧐Noticed the nestedness constrain above?
‼️This is a key passage: we prove for a 1-NN that violating nestedness degrades the Pareto Front.
This serves as insight that the same constraint should be imposed globally on DNNs.
@sam_hrvth@stevelaskaridis@mciccone_AI We now have layers which can be cut by reducing their rank: how to spend any global budget across layers?
While the problem is combinatorial, we can find an approximate Pareto Front in polynomial time.
A subset of these submodels is then optimized with KD.
@sam_hrvth@stevelaskaridis@mciccone_AI The first step is initializing an elastic rank space without retraining the model from scratch.
This is achieved by decomposing each pretrained layer via a DataSVD, i.e. an SVD aligned with data directions.
⚠️IMPORTANT: we show that this is good only as initialization
@sam_hrvth@stevelaskaridis@mciccone_AI Pretrained models are fixed monoliths, but in practice runtime budgets vary beyond the coarsely grained released variants.
Ideally, we'd like a family of shared-weight, pareto-optimal models, i.e. models lying on the optimal performance/cost frontier.
🚀 Excited to be at #NeurIPS2025 this week!
I’ll be presenting our work on distributed and federated optimization. You'll find me on 6th Dec:
- OPT for ML: 20A 10-11 am
- Reliable ML: 2, 1:30 2:15 pm
If you're working on learning at scale, come find me at — happy to chat 🤝
Do you feel FL research is stuck with methods that do not work well in realistic scenarios? 🤔
🫵We got you!
Introducing 🚀Generalized Heavy-Ball Momentum (GHBM)🚀, accepted at TMLR:
the FL algorithm with both SOTA theoretical guarantees and much better empirical results.
🧵1/9
🚀 Excited to be at #NeurIPS2025 this week!
I’ll be presenting our work on distributed and federated optimization. You'll find me on 6th Dec:
- OPT for ML: 20A 10-11 am
- Reliable ML: 2, 1:30 2:15 pm
If you're working on learning at scale, come find me at — happy to chat 🤝
🔥 Our paper SANSA is a #NeurIPS2025 Spotlight!
We turn #SAM2 into a semantic few-shot segmenter for objects and parts, fully promptable (mask · point · box · scribble); only 10M trainable parameters and 5× faster than competitors.
Code, models & demo
https://t.co/bdfUd1YnlG
👇
@AtaeiMe@SametOymac That should be classified as irresponsible reviewing, and it should be easy to prove that mentioned papers were allucinated, as well as published after the submission deadline. Just for curiosity, what is the comment for the "1" score?