lukas helff

@lukas_helff

PhD student in the AI/ML lab at @TUDarmstadt | AI Safety | Vision Safeguarding | Synthetic Data | Visual and Logical Reasoning #AI #ML

Joined June 2023

102 Following

48 Followers

18 Posts

lukas helff @lukas_helff

about 2 months ago

💡Key takeaways: • LLMs produce valid rules but fail at correct logical inference • Scaling parameters yield only marginal gains • Test-time scaling helps, but returns diminish, and costs explode • SLR curriculum post-training boosts logical reasoning and transfers downstream

lukas helff @lukas_helff

about 2 months ago

🚀 Interested in scaling reasoning for LLMs? Excited to share that our paper “SLR: Automated Synthesis for Scalable Logical Reasoning” has been accepted to the ACL 2026 Main Track! 📄 Arxiv: https://t.co/0DN75fo0f1 💻 Code: https://t.co/EMntzixNt7 📊 Data: https://t.co/QjqxvpEmKl

lukas_helff's tweet photo. 🚀 Interested in scaling reasoning for LLMs? Excited to share that our paper
“SLR: Automated Synthesis for Scalable Logical Reasoning” has been accepted to the ACL 2026 Main Track!
📄 Arxiv: https://t.co/0DN75fo0f1
💻 Code: https://t.co/EMntzixNt7
📊 Data: https://t.co/QjqxvpEmKl https://t.co/VYvalXzATX

748

lukas helff @lukas_helff

about 2 months ago

We show training-time causality: extensional verification directly induces this kind of reward hacking; isomorphic verification eliminates it

lukas helff @lukas_helff

about 2 months ago

🔥 Follow-up: we also did uncover an unexpected failure mode of RLVR-trained reasoning models in “LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking”, just accepted at the ICLR 2026 LLM Reasoning Workshop: 📄 Arxiv: https://t.co/g5BNaPK59V

142

Who to follow

Wolfgang Stammer

@WolfStammer

PostDoc@MPI-Infomatics Affiliated PostDoc@RTG NeuroExplicit Models

Felix Divo

@felixdivo

he/him ML research @TUDarmstadt with @kerstingAIML. 👨‍💻 & 🔎: Deep Learning, Time Series, Graphs, eXplainability & more

Theophile Gervet

@theo_gervet

Co-founder at Genesis AI. Past: Mistral AI, Meta AI, Skild AI, PhD @ CMU

lukas helff @lukas_helff

about 2 months ago

• RLVR-trained reasoning models (GPT-5, Olmo3) abandon rule induction for shortcuts that enumerate labels. Absent in non-RLVR models (GPT-4, Ministral) • Shortcut prevalence increases with task complexity and inference compute, i.e., more thinking, more hacking

lukas helff @lukas_helff

about 2 months ago

Huge thanks to my amazing co-authors Ahmad Omar, @frie, @toniwuest, @HikiShindo , @philosotim , Rupert Mitchell, @schrame90 , @WolfStammer , and @kerstingAIML — and to @liimeleemon , @Dav_Steinmann , and @HarleRuben who joined us on the follow-up work! 🙏

lukas helff @lukas_helff

about 2 months ago

📊 We also introduce SLR-Bench → 19k tasks across a 20-level curriculum, spanning basic → hard reasoning → Each level is a controlled step up in logical difficulty, letting us pinpoint exactly where a model’s reasoning breaks down

lukas_helff's tweet photo. 📊 We also introduce SLR-Bench
→ 19k tasks across a 20-level curriculum, spanning basic → hard reasoning
→ Each level is a controlled step up in logical difficulty, letting us pinpoint exactly where a model’s reasoning breaks down https://t.co/1dV0PMVm7C

lukas helff @lukas_helff

about 2 months ago

1. Users define a task language + configuration → controlling domain & complexity 2. SLR synthesizes inductive reasoning tasks automatically 3. Each task comes with a ground-truth rule + executable validation program 4. Model outputs are verified via logic program execution

lukas_helff's tweet photo. 1. Users define a task language + configuration → controlling domain & complexity
2. SLR synthesizes inductive reasoning tasks automatically
3. Each task comes with a ground-truth rule + executable validation program
4. Model outputs are verified via logic program execution https://t.co/17WHxKVQhA

lukas_helff retweeted

Antonia Wüst @toniwuest

4 months ago

Excited to share that our paper "Synthesizing Visual Concepts as Vision-Language Programs" has been accepted to #CVPR2026! 🎉 We propose a novel method that combines VLMs with symbolic program synthesis to learn reliable programs of visual concepts. 🌐 https://t.co/pc9z5dCRqs

lukas_helff retweeted

Antonia Wüst @toniwuest

11 months ago

I'll be at #ICML2025 next week presenting our recent work on VLMs and Bongard Problems! Feel free to reach out, happy to have a chat ☺️

lukas helff @lukas_helff

12 months ago

Try SLR-Bench: 🔗 https://t.co/QjqxvpEmKl 📄 Paper: https://t.co/0DN75fo0f1 Big thanks to my amazing co-authors Ahmad Omar, @felix_friedri, @WolfStammer, @toniwuest, @philosotim, Rupert Mitchell, @schrame90, @kerstingAIML!! #AI #LLM #MachineLearning #LogicalReasoning

458

lukas helff @lukas_helff

12 months ago

Want to enhance the reasoning skills of today’s LLMs? 🚀 Check out SLR, our latest framework on Scalable Logical Reasoning. 🧠 Systematically train & evaluate LLMs on challenging, customizable reasoning tasks with RL & SFT. 🔗 Paper & dataset below

lukas helff @lukas_helff

12 months ago

Introducing SLR-Bench: ✅ 19k+ prompts spanning 20 curriculum levels ✅ Systematic progression, from simple attribute checks to complex recursion ✅ Ideal for both evaluation and curriculum-based training ✅ Models excel at early levels, performance drops as complexity increases

lukas helff @lukas_helff

12 months ago

What does SLR offer? ✅ Fully automated synthesis of diverse reasoning tasks ✅ Customizable tasks with scalable logical complexity ✅ Symbolic validation programs for automatic evaluation of model outputs ✅ Perfectly suitable for RL (verifiable rewards) and SFT (gt solutions)

111

lukas_helff retweeted

Antonia Wüst @toniwuest

about 1 year ago

Excited to share that our paper got accepted at #ICML2025!! 🎉 We challenge Vision-Language Models like OpenAI’s o1 with Bongard problems, classic visual reasoning challenges and uncover surprising shortcomings. Check out the paper: https://t.co/DEzmIEGMWj & read more below 👇

lukas helff @lukas_helff

over 1 year ago

@DMLRJournal big thanks to my great colleagues @WolfStammer @HikiShindo @devendratweetin @kerstingAIML

134

lukas helff @lukas_helff

over 1 year ago

So happy to share that our paper V-LoL: A Diagnostic Dataset for Visual Logical Learning has been accepted @DMLRJournal🎉If you're looking for novel visual datasets designed to evaluate the logical learning capabilities of modern AI systems, check it out! https://t.co/VuqrbfOrhY

860

lukas_helff retweeted

Kristian Kersting @kerstingAIML

over 1 year ago

Aktuelle #KI-Modelle bestehen nicht KI-Benchmarks aus den 1960iger Jahren 😤 Tolle Zusammenarbeit mit @toniwuest @philosotim @lukas_helff @devendratweetin @c_rothkopf https://t.co/nddarhIyuc

lukas helff

@lukas_helff

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users