Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning!
📝 250,000+ problems, 47k NEW Q's
✅ 10x larger than existing datasets like MATH
🧑⚖️ Verifiable—we eliminated 400k+ problems
Details below! 🧵👇
What if models could learn which problems _deserve_ deep thinking?
No labels. Just let the model discover difficulty through its own performance during training.
Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓
10/10 Bottom line: ALP teaches models to think harder on hard problems and think less on easy ones - exactly what we want for efficient reasoning!
📄 Paper: https://t.co/XAqTaWsGBw
🔧 Models: https://t.co/BLBSbQHu2L
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training.
Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50%
🧵👇1/10
Generative Reward Models impact compounds daily.
way stronger interest now than when we published last fall 👇
many excellent recent extensions—cool seeing where
researchers take GenRM
btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration.
want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩
Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: https://t.co/WvwvcYshkL
🔹 Goal:
Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated search and exploration strategies.
🔹 Solution:
Build scalable training infrastructure. Reasoning models require large datasets and distributed computing, making multi-node training and high-performance GPUs essential for effective results.
🔹 Results:
Using TractoAI, a serverless platform on Nebius AI Cloud, SynthLabs trains AI reasoning models, laying the foundation for next-gen reasoning systems and enterprise use.
The final stop in our meetup series will be in San Francisco! 🌁 https://t.co/0wn9tQCT77
Join us at Convene 100 Stockton near Union Square on Thursday, March 13, for a deep dive into our AI cloud. Our developers, AI R&D engineers and architects will share insights with the tech community on how we build AI Cloud Accelerated by NVIDIA and contribute to the AI field and open source.
🛍️ What’s in store?
- An insider’s look at Nebius architecture and development principles
- A deep dive into how we developed Soperator, our K8s operator for Slurm
- Discover how test-time computation unlocks the potential of agentic systems
- We’ll also be joined by the co-founders of @synth_labs, telling how they use our serverless platform TractoAI
Plus, attendees will receive promo codes to test our cloud and inference-as-a-service.
⚡ Spaces are limited! Learn more and register now on the Nebius website to secure your spot: https://t.co/0wn9tQCT77