Super exciting work from my friend @RyanBoldi! From an HCI POV, I’m especially excited about how RL on multiple objectives might make models more socially intelligent while avoiding pitfalls of optimizing on one narrow objective (e.g., sycophancy from RLHF)
Your RL post-training may be sabotaging your LLM’s test-time scaling!
Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*.
We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.
Sycophancy, disempowerment, homogenization of thought: lots to be grim about for what AI is doing to us, the collapse of our subjectivity into a machine "objectivity". But a lot of AI's value seems to come precisely from scaling this objectivity. How do we make sense of this?
In the picture I lay out, we need work both *within* norms and work *on* norms. We've already thought a lot about how AI can help us work *within* norms, since that objective was more easily definable. There is more to be done on AI that helps us work *on* norms.
If you're interested in additional perspectives on this work, check out @JennyHuang99's blogpost on "slow AI" https://t.co/lUJ2vsoDdZ and my blogpost on AI for "work *on* norms" https://t.co/INfmgRG3PL
“Should I fear death?” Ask an LLM and you get one answer or a big bag, but little visibility into the decisions and assumptions that produced them. We built the "conceptual multiverse": a system that makes those decisions transparent and intervenable. https://t.co/oQlWs0KFHu
recently, i’ve been thinking about ways to design ai systems to be more compatible with slow thinking 🐌.
you can check out the full blogpost here 🤗:
https://t.co/3hdYCIpuoN
There's been a lot of excitement about pluralistic value alignment 🌈 — AI that reflects the full range of human perspectives
But no formal way to benchmark whether we're actually making progress. 🤔
Introducing 𝐎𝐕𝐄𝐑𝐓𝐎𝐍𝐁𝐄𝐍𝐂𝐇. 🎉Accepted to #ICLR2026
1/n 🧵
“Technical computer science savvy and deep philosophical commitments”: @UW#UWAllen alum @andreiskiii was named the @UWArtSci Dean’s Medalist in Social Sciences for his campus leadership and research contributions spanning #AI and philosophy. #UWdiscovers https://t.co/FJ577PExJx