A recent study found an LLM scored 95% on a healthcare benchmark. Deployed with real patients, it dropped to 34%.
In our new work, we argue the problem isn't the benchmark, but the implicit assumptions buried in evaluation.
Paper: https://t.co/mi445QtJvM
🧵 1/n
Excited to share this AAAI blog post on our new paper assignment algorithm used for AAAI 2026 (with Michael Cui, Chenxin Dai, and @YixuanEvenXu) and the resulting statistics. Thanks to AAAI 2026 Program Chairs Matt Taylor and Chad Jenkins, and Conference Chair @k_leyton_brown
Curious about the paper assignment algorithm used for AAAI 2026? The new algorithm substantially improved the robustness of large-scale paper–reviewer assignments, eliminating clear forms of strategic behavior and increasing diversity, while retaining nearly all of the assignment quality achieved by standard methods.
Read more: https://t.co/8DqCnvlIlT
Check out our recent work on Humanization by Iterative Paraphrasing (HIP)!
We find that commercial AI-text detectors often classify text from base LLMs as human-written, HIP leverages this observation to improve detector evasion.
🤖 AI text detectors are widely deployed in education and integrity workflows, but what are they actually tracking?
We report a surprising finding: text from base models is overwhelmingly judged as human by GPTZero and Pangram. 👇 (1/6)
Training concept-based models relies on concept selection which is labor-intensive and slow.
We introduce Decision-Relevant Selection (DRS), a principled algorithm for automatic concept selection in RL.
Paper: https://t.co/TYtOJFaE4D
Website: https://t.co/NOTmQLFI8q
🧵 1/n
SCS researchers have developed an AI-powered chatbot, PeerCoPilot, designed both with and specifically for people working in behavioral health.
👉 https://t.co/MxwTquMHfe
SCS researchers have developed an AI-powered chatbot, PeerCoPilot, designed both with and specifically for people working in behavioral health.
👉 https://t.co/MxwTquMHfe
🧬 Distillation enables efficient emulation of LLMs, but verifying provenance remains a critical challenge.
Introducing Antidistillation Fingerprinting (ADFP): A principled approach that aligns signals with student learning dynamics. 👇 (1/6)
At 4pm, we will have our panel discussion on AI education. Panelists include our invited speakers Serene Bioth, @eunicemjun as well as Milind Tamar @MilindTambe_AI and Leo Porter
As a co-chair for the NeurIPS 2025 Education Program, we are excited about the One-Day Event on AI Education which will take place tomorrow Tue 2 Dec 10am - 5pm PST at Upper Level Room 9. More details here: https://t.co/PgMYxkWJQy @adityagrover_@NaveenJRaman
As a co-chair for the NeurIPS 2025 Education Program, we are excited about the One-Day Event on AI Education which will take place tomorrow Tue 2 Dec 10am - 5pm PST at Upper Level Room 9. More details here: https://t.co/PgMYxkWJQy @adityagrover_@NaveenJRaman
The first session at 10am tomorrow will be an interactive session led by Julien Besset on "Cut Through the Noise: How to Write an Effective Elevator Pitch". It will equip you with practical tools to translate your research into a short, effective, and accessible overview.
As a co-chair for the NeurIPS 2025 Education Program, we are excited about the One-Day Event on AI Education which will take place tomorrow Tue 2 Dec 10am - 5pm PST at Upper Level Room 9. More details here: https://t.co/PgMYxkWJQy @adityagrover_@NaveenJRaman
As a co-chair for the NeurIPS 2025 Education Program, we are excited about the One-Day Event on AI Education which will take place tomorrow Tue 2 Dec 10am - 5pm PST at Upper Level Room 9. More details here: https://t.co/PgMYxkWJQy @adityagrover_@NaveenJRaman
How do we close the gap between specialist RL and generalist LLM agents?
We're benchmarking it in Pokémon. Join us at the PokeAgent Challenge competition workshop @ NeurIPS 2025.
📍 Dec 7, 8AM in San Diego
🎮 Track 1: Competitive Pokémon (game-theoretic reasoning)
🗺️ Track 2: Speedrunning (long-horizon planning)
Speakers from Google DeepMind, NYU, CMU, UT Austin, Princeton.
📣 Honored to be selected as Honorable Mention for the @SCSatCMU Distinguished Dissertation Award!!
Thanks to my advisor @fangf07 & committee Geoff Gordon, @hongshenus, @katjahofmann, & @OriolVinyalsML (+ other mentors and collaborators) for their support 🖤
& congrats to Juncheng, Tim, and Brian 🎉
✨ Did you know that NOT using all generated rollouts in GRPO can boost your reasoning LLM? Meet PODS! We down-sample rollouts and train on just a fraction, delivering notable gains over vanilla GRPO. (1/7)
Another life update!! 🎉
I’m joining @JHUCompSci as an Assistant Professor starting Fall 2026! Apply to work with me on reinforcement learning, foundation models, & human-centered AI. Let’s build better AI agents 🤖🙆♀️🦀
Before that, I’ll join @NYU_Courant as an Assistant Professor/Faculty Fellow. Excited to spend a year in NYC!
Another life update!! 🎉
I’m joining @JHUCompSci as an Assistant Professor starting Fall 2026! Apply to work with me on reinforcement learning, foundation models, & human-centered AI. Let’s build better AI agents 🤖🙆♀️🦀
Before that, I’ll join @NYU_Courant as an Assistant Professor/Faculty Fellow. Excited to spend a year in NYC!
Excited to be at #aamas2025 !
- My keynote talk at C-MAS workshop today: 2-2:45pm, Maquette A
- Will attend panel at ALA workshop today: 4:30-5:30pm, Salon 2
- Siyu Liu (PhD advised by @___tiffanyb___ ) will present our joint paper on Friday 10:45am, Salon 3