🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training
1. Annotating human preference datasets without spending a fortune
2. Quantifying uncertainty for reward models
🔗https://t.co/sEYx618oIc
@agarwl_ Great work! I often think of the weights the other way around. Model-weights govern the immediate prompt-response connection (System 1) while prompt-weights (or the harness) define the slow-thinking process through reasoning, tool-calls, self-reflection,... (System 2).
If you're at ICLR 2026, come by 👇
🗓️ Saturday, April 25, 10.30 to 13.00 📍 Poster Session 5, Pavilion 4, #4808
📄 https://t.co/CpOxTtdICV
💻 https://t.co/rvg6KeR6w5
Joint work w/ @thomasklbg and @arkrause.
What do you do when reward models fail in RLHF?
Scalar rewards flatten messy, context dependent human preferences into a single number. The reward model learns a distortion, and the policy optimizes it faithfully. 🧵
A Leader commits to an action, and a Follower refines it.
This asymmetry captures richer preferences than scalar rewards and provides stable training.
As a bonus, it offers inference-time refinement with two turn rollouts deliver ~60% gains over single turn.
🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training
1. Annotating human preference datasets without spending a fortune
2. Quantifying uncertainty for reward models
🔗https://t.co/sEYx618oIc
📄 RewardUQ (https://t.co/nZl8WnkTtN)
We rigorously compare UQ methods for reward models and draw practical insights for active learning and robust RL post-training. The results were immediately applied in ActiveUltraFeedback!
📄 ActiveUltraFeedback (https://t.co/QETqRMA2dn)
How much preference data do you really need?
We show that active learning can match or beat static baselines using as little as 1/6 of the annotations across datasets and algorithms!
Deployed LLMs and users generate millions of conversations every day.
These are full of useful learning signals, yet we don't use them for training.
We introduce self-distillation for learning directly from user conversations – no rewards, no labels, no extra models.
ZurichNLP#19 is next Monday at @ETH_AI_Center!
Sina Ahmadi (@sina_ahm, @UZH_en) on language for low-resource varities, and Barna Pasztor (@pasztorb, @ETH_AI_Center) on sample-efficient dataset collection for RLHF.
RSVP below! Spots limited as always.
I am attending @NeurIPSConf 2025 next week in San Diego, CA! Reach out to chat about RLHF and preference optimisation! I am happy to discuss future collaborations and open positions in 2026.
#NeurIPS2025
Great to have @eldsjal visit with @shak & @piammichel, yesterday! Many nice demo day interactions with our cutting-edge AI research projects & ventures. Their concluding message: now’s the time to build with massive impact - and ETH AI Center is one of the best places to start 🚀
Amazing experience to be part of this project and work on post-training at scale with an exceptional team!
More great things to come to push the open-source LLM community!
@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: https://t.co/7bJlINiIdn #Apertus#AI
At #AAMAS25 in Detroit this week and presenting my work with @pasztorb & @gio_ramponi Thursday afternoon - if you're here, let's connect and chat about learned algorithmic collusion, or go for a morning run!
I am presenting two papers this week at #NeurIPS2024 focusing on preference-based RL!
1. Contextual Bilevel Reinforcement Learning for Incentive Alignment: #6505 West, 11AM, Thursday
2. Bandits with Preference Feedback: A Stackelberg Game Perspective: #5807 West, 11AM, Friday
I am not attending #NeurIPS this year, but Vinzenz Thoma and @pasztorb yes :)
Come to chat about our recent work on "Contextual Bilevel Reinforcement Learning for Incentive Alignment" 🗓️ Thu 12 Dec 11 a.m
🔬 Advance the frontiers of AI: @ETH_AI_Center Fellowship Programs –#PhD & #Postdoc Opportunities 🔬
💫Push the boundaries of Reinforcement Learning and Data-driven Control💫
✍️ Apply by November 19, 2024: ttps://ai.ethz.ch/apply
PLS SHARE:
I'm hiring a PhD student to work on ML theory, to begin in Fall 2025.
Topics include: generalization bounds & statistical inference via online prediction, representation learning via optimal transport, sequential decision making...
More info:
https://t.co/QFLEqWZORj