🚀 Interested in scaling reasoning for LLMs? Excited to share that our paper
“SLR: Automated Synthesis for Scalable Logical Reasoning” has been accepted to the ACL 2026 Main Track!
📄 Arxiv: https://t.co/0DN75fo0f1
💻 Code: https://t.co/EMntzixNt7
📊 Data: https://t.co/QjqxvpEmKl
🔥 Follow-up: we also did uncover an unexpected failure mode of RLVR-trained reasoning models in “LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking”, just accepted at the ICLR 2026 LLM Reasoning Workshop:
📄 Arxiv: https://t.co/g5BNaPK59V
• RLVR-trained reasoning models (GPT-5, Olmo3) abandon rule induction for shortcuts that enumerate labels. Absent in non-RLVR models (GPT-4, Ministral)
• Shortcut prevalence increases with task complexity and inference compute, i.e., more thinking, more hacking
📊 We also introduce SLR-Bench
→ 19k tasks across a 20-level curriculum, spanning basic → hard reasoning
→ Each level is a controlled step up in logical difficulty, letting us pinpoint exactly where a model’s reasoning breaks down
1. Users define a task language + configuration → controlling domain & complexity
2. SLR synthesizes inductive reasoning tasks automatically
3. Each task comes with a ground-truth rule + executable validation program
4. Model outputs are verified via logic program execution
Excited to share that our paper "Synthesizing Visual Concepts as Vision-Language Programs" has been accepted to #CVPR2026! 🎉
We propose a novel method that combines VLMs with symbolic program synthesis to learn reliable programs of visual concepts.
🌐 https://t.co/pc9z5dCRqs
Want to enhance the reasoning skills of today’s LLMs?
🚀 Check out SLR, our latest framework on Scalable Logical Reasoning.
🧠 Systematically train & evaluate LLMs on challenging, customizable reasoning tasks with RL & SFT.
🔗 Paper & dataset below
Introducing SLR-Bench:
✅ 19k+ prompts spanning 20 curriculum levels
✅ Systematic progression, from simple attribute checks to complex recursion
✅ Ideal for both evaluation and curriculum-based training
✅ Models excel at early levels, performance drops as complexity increases
What does SLR offer?
✅ Fully automated synthesis of diverse reasoning tasks
✅ Customizable tasks with scalable logical complexity
✅ Symbolic validation programs for automatic evaluation of model outputs
✅ Perfectly suitable for RL (verifiable rewards) and SFT (gt solutions)
Excited to share that our paper got accepted at #ICML2025!! 🎉
We challenge Vision-Language Models like OpenAI’s o1 with Bongard problems, classic visual reasoning challenges and uncover surprising shortcomings.
Check out the paper: https://t.co/DEzmIEGMWj
& read more below 👇
So happy to share that our paper V-LoL: A Diagnostic Dataset for Visual Logical Learning has been accepted @DMLRJournal🎉If you're looking for novel visual datasets designed to evaluate the logical learning capabilities of modern AI systems, check it out! https://t.co/VuqrbfOrhY