Professor at Texas A&M University; ML/AI researcher; optimization for ML/AI; large reasoning models, developing LibAUC library for training deep neural nets.
1/4 🧠⚡ Can reasoning models think faster without thinking worse?
Recent systems like Meta Muse Spark use length penalties in training to reduce unnecessary output tokens.
A simple approach is to combine penalty-based reward with correctness-based reward in GRPO.
4/4 📊 The results are remarkable: DRPO surpasses six strong GRPO-based baselines, achieving much shorter reasoning traces while maintaining or even improving performance.
📄 Paper: https://t.co/X3l4byJtYC
💻 Code: https://t.co/lshIFCPiMu
3/4 🚀 In our recent ICLR paper, we propose DRPO: Decoupled Reward Policy Optimization.
💡 Key idea: decouple length optimization for correct and incorrect rollouts, so the model learns to be concise without punishing valid reasoning.
@AleksandraFaust@roydanroy@iclr_conf I learned that this year ICLR did have grad students serving as AC. This information was found on the student’s website. I am not saying graduate students are not necessarily qualified for AC. If the selection process is not well done, no guarantee fair decisions solely by AC.
@ericxing Thanks for this great effort. It would be highly appreciated by the community. My group will definitely explore the released datasets and code.
Thank you so much for the shoutout!! @GalantiTomer
If you’re excited to dig into why self-supervised contrastive learning works so well, come swing by our poster session!
📍 Exhibit Hall C/D/E — Poster #2607
🗓️ Fri, Dec 5 • 4:30–7:30 PM PST
2/2 Shouldn’t this be strictly prohibited from the outset? If such conflicts are not prevented, it becomes difficult to maintain trust in the review system, including designations like “oral paper” or “best paper.”
1/2 Given the recent leakage of reviewer identities, some authors have reported that their papers were reviewed by people from the same institution or even the same research group. This raises serious concerns: how is such a conflict of interest even possible?