Very excited to start my Student Researcher position at @GoogleDeepMind next week! Flying out tomorrow, ready to wield my #GoogleInterns propeller head proudly
🎉We are proud of @timokauf, who is off to @GoogleDeepMind, London as a Student Researcher! On Monday he'll begin 6 months on the Amplified Oversight team, which tackles a central challenge in AI safety: how humans can reliably oversee models that are becoming ever more capable🚀
@ZacKenton1@DavidDAfrica@jacob_pfau Thanks for making me aware! Inverse constitutional learning has a lot of overlap with the ICAI + FF combination and understanding motivations would be great for alignment. Right now the method is more focused on expressed traits, but might be extensible (@arduinfindeis)
Somebody recorded me at our poster. Cool to have a video! Don't expect a full explanation though, it's just a random excerpt and I had no idea I was being filmed 🙃
#NeurIPS2025 has passed, but we hope the celebration will last forever.
Here is a poster presentation we recorded at the event and we hope it can last forever in cyberspace.
The paper: ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
Thanks for the authors: @timokauf, @metz_yannick, Daniel Keim, Eyke Hüllermeier
@Starc_Institute Thanks for posting this, nice to have a video of the session!
I had no idea I was being filmed. The video starts sort of in the middle of the explanation, so take a look at the paper if you're interested.
Paper: https://t.co/suPIUZpewO
Poster etc.: https://t.co/5gYGIxNaem
Had a good time presenting this! (Though who decided poster sessions should go until 7:30pm?) Thanks to Max Muschalik for helping me present even without being an author. Poster presentations really need two people!
Benefits: improved reward model generalization, better data efficiency, and improved policies. Looking forward to seeing you at the poster!
Paper and more: https://t.co/5gYGIxNaem
The key insight is that these signals only need to be locally valid and relative (e.g., within one annotator's comparisons). No need to model the exact relationship to strength. Just rank which comparisons are stronger.
The core idea: Not all preferences are equal. ResponseRank learns preference strength from implicit signals in your data, like inter-annotator agreement, stated confidence, or response times.
I think Gemini 3 Pro’s personality and style notably improved over 2.5.
It uses fewer common shortcuts to bias human annotators, e.g. long responses or overly polite tone. That makes the strong LMArena performance quite a bit more impressive!
Some examples of the differences 🧵
@janleike Thanks for pointing this out! Do you think Anthropic will be able to offer visa sponsorship in future iterations? I'm a German PhD student and would love to apply, but cannot self-sponsor as far as I can tell.
@arduinfindeis and our lab member @timokauf presented Inverse Constitutional AI. Had a blast, the pictures speak for themselves! Joint work with @eyke_hu, @SamuelAlbanie, @RobDMullins.
Paper📰https://t.co/49eGzYchQs
GitHub: https://t.co/dMlHYlCl5r
🧵2/4
Excited to be in Singapore for ICLR! Keen to chat about interpreting feedback data and detecting model characteristics ⚖️
Reach out or come by our poster on Inverse Constitutional AI tomorrow, Friday 25 April from 10am-12.30pm (#520 in Hall 2B) - @timokauf and I will be there!
🤖 3. Discovering model strengths
How is GPT-4o different to other models? → Uses more numbered lists, but Gemini is more friendly and polite
https://t.co/bDCQXGtRFW
🕵🏻💬 Introducing Feedback Forensics: a new tool to investigate pairwise preference data.
Feedback data is notoriously difficult to interpret and has many known issues – our app aims to help!
Try it at https://t.co/4HubCg52Pi
Three example use-cases 👇🧵
Our paper on query-efficient reward learning will be at AAAI! Unfortunately I won’t be attending, but Xuening will be at the poster - stop by or reach out online!
🚀 Excited to share our latest work at #AAAI2025!
How can we make Reinforcement Learning from Human Feedback (RLHF) more query-efficient? Introducing DUO: Diverse, Uncertain, On-policy query generation and selection.
🧵👇 1/6