We looked into the weeds of hindsight experience replay and came up with an efficient way of learning from all goals at once in off-policy goal-conditioned RL!
It works well for reasonable numbers (100s--1000s) of non-exclusive sparse reward goals
Hindsight Experience Replay has become the ubiquitous method for goal-conditioned reinforcement learning, but leaves open the question of which goal to relabel with.
In this work, accepted at ICML, we propose instead simply Learning Everything All at Once (LEO).
1/
@mmuthukrishna and I are hiring a postdoc to join our labs at NYU! We're looking for someone excited to work on one of society's newly emerging and potentially generation-shaping challenges: the multi-agent alignment problem.
can Claude self-report an injected emotion with neutral context? can it detect the mismatch between the emotion and the context (why am i feeling like this?)
seems related to the introspection study: https://t.co/TQqqt2QUjb
what other mental states could we do this with?
New Anthropic research: Emotion concepts and their function in a large language model.
All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
w/ Tracey Mills, Ben Prytawski,@mhtessler, @noahdgoodman, @jacobandreas, Josh Tenenbaum
Paper: https://t.co/tDZsgL2wDN
Play the games here: https://t.co/anWBZVvYNW
New paper at ICLR 2026! 🎉
"Language and Experience: A Computational Model of Social Learning in Complex Tasks"
We model how humans combine advice from others with direct experience to learn new tasks, and show this enables bidirectional human-AI knowledge transfer.
🧵⤵️
Can knowledge accumulate across generations?
We run iterated learning chains: each agent gets only 2 lives, then passes advice to the next
Performance increases across generations: partial knowledge compounds through language, mirroring cultural evolution in human populations
@RhysSullivan not all the way there but i just setup a three way interaction between a telegram bot, obsidian and claude code
so you can chat with claude code about your notes from your phone via telegram, send it links and pdfs to process and it can co-manage your obsidian