Our new optimizer AMUSE: Muon + Schedule-Free + time-varying SF momentum.
No LR schedule needed, beats tuned scheduled baselines.
Two concurrent works converging on similar ideas:
• ScheduleFree+ (@aaron_defazio): SF-AdamW + time-varying SF momentum
• SF-NorMuon (@jlylekim)
🚨New Optimizer Paper
AMUSE: Anytime MUon with Stable gradient Evaluation
AMUSE combines Muon with Schedule-Free-style gradient evaluation for stable anytime training without LR decay.
• Stronger 124M / 720M / 1B pretraining
• Strong ImageNet / ViT fine-tuning performance.
It‘s an honor to receive the Best Student Paper Award at #ALT2026 (37th Algorithmic Learning Theory) ! 🏆
Huge thanks to my amazing collaborators Boyao,@Collapsar0000 ,@Tianyu0628 ,@MinhakSong ,@nsfzyzz !
Had a great time at the Fields Institute in Toronto. 🇨🇦 Looking forward to attending ALT again next time! ✨
Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.*
With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.
Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.
PPO vs. DPO? 🤔
Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward.
Paper: https://t.co/qNWwWhQQpA
Led by @smellycat_ZZZ@MinhakSong
RLHF vs DPO under reward and/or policy model misspecification—when does each method succeed?
Our new paper provides a fine-grained theoretical comparison.
📄 https://t.co/vdpAiQHu5l
Two-stage RLHF or one-stage DPO: Which one is better for learning from preferences?
Equal under strong assumptions, but representation differences break the tie. Our paper reveals their fine-grained performance gaps under various conditions.
paper: https://t.co/B3OD6YRAts