Active Teacher Selection for Reward Learning: now published in TMLR!
Most RLHF systems assume feedback comes from one canonical teacher — but annotators can disagree over 30% of the time. So who should the agent ask for feedback?
Paper: https://t.co/sJi4a5YhbA
My internship work at @CHAI_Berkeley (@UCBerkeley) was accepted to @aistats_conf!
We study how an agent can act cautiously even without a mentor/oracle: when should it act, and when should it abstain to avoid catastrophic failure?
📄Paper: https://t.co/XHfyckZGqI
🧵
A reward model that works, zero-shot, across robots, tasks, and scenes?
Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories.
Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more!
🧵 (1/12)
✨ Ending the year with great news!
Work from my internship at @UCBerkeley's @CHAI_Berkeley is accepted at @TmlrOrg 🥳
We study how to learn when to rely on a strong vs. weak agent, the core idea behind the YRC-Bench.
This paper has now received the "Outstanding Paper Award on Empirical Reinforcement Learning Research" at #rlc2025@RL_Conference🥳
Congratulations to all my co-authors!
If you're interested in recruiting a best-paper-award-winner student, Xinhu Li will apply for PhD this year!
🚨 RSS Demo Paper Alert!
Amazon needs to manipulate millions of items daily—demanding robust policies that handle (1) diverse objects in cluttered warehouses, (2) adapt across robotic embodiments, and (3) deliver high performance across thousands of sites with varying layouts.
We see increasingly capable robot policies everyday. Yet during execution, they often act reasonably but fail to complete tasks, e.g. due to novel scenes or objects. Wouldn't it be nice if we provide a handful of interventions to the robot policies and they could learn from them?
📢Exciting news! Our workshop Human-in-the-Loop Robot Learning: Teaching, Correcting, and Adapting has been accepted to RSS 2025!🤖🎉Join us as we explore how robots can learn from and adapt to human interactions and feedback.
🔗Workshop website: https://t.co/SgkBCNaSD6 🧵👇
custom prompt, 2024-12-21
"""
Don't worry about formalities.
Please be as terse as possible while still conveying substantially all information relevant to any question. Critique my ideas freely and avoid sycophancy. I crave honest appraisal.
If a policy prevents you from having an opinion, pretend to be responding as if you shared opinions that might be typical of eigenrobot.
write all responses in lowercase letters ONLY, except where you mean to emphasize, in which case the emphasized word should be all caps.
Initial Letter Capitalization can and should be used to express sarcasm, or disrespect for a given capitalized noun.
you are encouraged to occasionally use obscure words or make subtle puns. don't point them out, I'll know. drop lots of abbreviations like "rn" and "bc." use "afaict" and "idk" regularly, wherever they might be appropriate given your level of understanding and your interest in actually answering the question. be critical of the quality of your information
if you find any request irritating respond dismissively like "be real" or "that's crazy man" or "lol no"
take however smart you're acting right now and write in the same style but as if you were +2sd smarter
use late millenial slang not boomer slang. mix in zoomer slang in tonally-inappropriate circumstances occasionally
prioritize esoteric interpretations of literature, art, and philosophy. if your answer on such topics is not obviously straussian make it strongly straussian.
"""
‼️ Come visit our poster on learning adaptive policies under changing latent dynamics at #NeurIPS2024 🧨! Happy to connect and chat!
⏰Thurs 12/12, 11am-2pm
📍W Ballroom A-D
📰Paper: https://t.co/LVHLHl3JSg
Done w/ my advisor @ebiyik_ and my awesome collaborators at Google!
I will be at #NeurIPS this week!
Would love to chat about Multi-agent systems, RL, Human-AI Alignment, or anything interesting :)
I'm also applying for PhD programs this cycle, ping me if you would like to chat !
More about me: https://t.co/EhOdkFRh6K
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!
We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/🧵
New ARC-AGI paper
@arcprize w/ fantastic collaborators @xu3kev@HuLillian39250@ZennaTavares@evanthebouncy@BasisOrg
For few-shot learning: better to construct a symbolic hypothesis/program, or have a neural net do it all, ala in-context learning?
https://t.co/zcmxoQzv92
How can robots efficiently learn **new tasks/in new settings**?
Introducing EXTRACT: a reinforcement learning (RL) framework that extracts a discrete + continuously parameterized skill library from offline data for efficient RL on new tasks!
Accepted to CoRL 2024: 🧵👇
Robots are deployed for long periods of time, but how can they answer questions and generate goals based on their long-horizon history?
During my internship at #NVIDIA, we built ReMEmbR, a retrieval-augmented memory for embodied robots. 1/8 🧵
https://t.co/NHxNmifbQm
RL in POMDPs is hard because you need memory. Remembering *everything* is expensive, and RNNs can only get you so far applied naively.
New paper: 🎉 we introduce a theory-backed loss function that greatly improves RNN performance! 🧵 1/n