Excited and grateful to have contributed to reasoning, RL, and STEM climbing behind the MAI-Thinking-1 model.
This has been an intense but fun hill-climbing journey from scratch. Please check out the technical report:
https://t.co/qfvkPYKTgN
The best part of this job is seeing students graduate and launch their careers! Congrats to Feng Chen, Atsushi Yamamura, Tamra Nebabu, Linnie Wharton and Daniel Kunin. They are all going on to top positions across artificial intelligence, medicine, and physics. Proud of you!
Proud to be part of the team behind this new open-source SOTA formal math prover! 🚀
Achieving 72.95% on MiniF2F with simple BFS strategy. Our models are trained using expert iteration and DPO, pushing the boundaries of formal theorem proving.
📄 Paper: https://t.co/JqLYiKScZ1
🚀 Excited to announce BFS-Prover, our state-of-the-art theorem proving system in Lean4!
We've achieved 72.95% on the MiniF2F test, surpassing all previous systems including DeepSeek-Prover-v1.5, InternLM2.5-StepProver, and HunyuanProver 📈
🔥 Key innovations:
- Simple Best-First Search, rather than complex MCTS
- No critic model (value function) needed
Our 7B tactic generation model is now available on huggingface: https://t.co/94Oyumzocs
Paper: https://t.co/Gr51JMFxee
This challenges the perceived necessity that complex search methods are necessary for formal theorem proving. Sometimes simpler is better!
@FCHEN_AI
4/ We extend our algorithm to automated theorem proving and math QA with CoT. In theorem proving, our approach improves performance by controlling the exploitation and exploration tradeoff in proof trees. In CoT, where overconfidence is less severe, we also see performance gains.
1/ Our new paper: “Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning” on how to change training to better exploit test-time compute!
co-led by @AllanRaventos, w/ Nan Cheng, @SuryaGanguli & @ShaulDr
https://t.co/xM49OB6sk7
3/ We propose directly optimizing for coverage in the fine-tuning loss with Direct Coverage Optimization (DCO). DCO attenuates gradients on high-confidence samples, regularizing away from overconfidence. We demonstrate superior accuracy frontiers over CE loss in MATH and MiniF2F.
Join us at the ML for Multiscale Processes workshop at #ICLR2025 to hear from our three first amazing keynotes:
Qianxiao Li https://t.co/ZQG6acjlwK
Sergei Gukov https://t.co/2E5KjTpHxq
Charlotte Bunne https://t.co/ip4QAvcXNz
Want to learn about SGD's implicit bias towards simpler subnetworks generated by permutation symmetry?!
Come to our NeurIPS poster session tomorrow morning 10:45 - 12:45 Hall B1+B2 (level 1) #906
1/ Our new paper lead by @AllanRaventos@mansiege , @FCHEN_AI asks when in-context learning of regression can solve fundamentally *new* problems *not* seen during pre-training, and reveals it as an emergent capability arising from a phase transition... https://t.co/gqaioAUL7Q
Can in-context learning learn new tasks different from those in the pretraining data? Is this an emergent ability, i.e. does it arise from pretraining without being explicitly optimized for? How does this depend on pretraining task diversity? 🧵 1/
https://t.co/g118pWgAA9
Our new preprint reveals how SGD biases neural nets towards vastly simpler subnets w/ superior generalization via stochastic collapse to invariant sets & explains why prolonged large learning rates help
co-led w/ @FCHEN_AI@atsushi_y1230
& @SuryaGanguli
https://t.co/tJKWp1Neng