🚨📄 New preprint! We find the “boiling the frog” equivalent of AI use. In a series of RCTs, we show that after just 10 min of AI assistance people perform worse and give up more often than those who never used AI.
w Grace Liu @brianchristian Mira Dumbalska and Rachit Dubey 🧵
1/ Reinforcement learning is usually framed as maximizing rewards. But can we cast it as reaching the right goals?
New blog on bridging RL, goal-conditioned RL, and stochastic shortest path:
https://t.co/WBpEtx9Wiu
Also #ICLR2026 Poster: Thu 10:30 AM–1:00 PM, P4 #4611.
🧵⬇️
I’m excited to present our poster “Demystifying the Mechanisms Behind Emergent Exploration in Goal-Conditioned RL” at the Coginterp workshop at #NeurIPS2025!
https://t.co/U3OprctltS
📅 Dec 7
1:15 PM
Upper Level Room 5AB
@princeton_rl@MBastankhah
🙏 This work was done with my incredible collaborator @QuYuxiao and our amazing advisors Jeff Schneider, Aarti Singh, and @aviral_kumar2!
Website: https://t.co/HyU5i5mRsS
Paper link: https://t.co/EdeABnkElL
[9/9]
NEW PAPER: "CaRT: Teaching LLM Agents to Know When They Know Enough"!
LLMs often overthink, ask too many questions, or waste compute.
We introduce Counterfactuals and Reasoning for Termination (CaRT) - teaching LLMs when to stop gathering info and make decisions.
🧵[1/9]
Explore our paper for more insights:
⛏️ How counterfactual pairs teach models to recognize information sufficiency?
⛏️ Why verbal reasoning acts as an implicit value function for termination?
⛏️ How reasoning stabilizes decision boundaries and improves OOD robustness?
[8/9]
NEW PAPER: "Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL"
How do RL algorithms develop sophisticated exploration strategies without explicit rewards? We provide insight into this question by studying Single-Goal Contrastive RL (SGCRL).
[1/9]