✨Announcing the first Workshop on Agent Behavior @COLM_conf 2026 (Oct 9, San Francisco 🌅)
https://t.co/8vYnVp4uXf
We invite two types of contributions: (i) papers, and (ii) benchmark proposals. We are also seeking reviewers. More details below!
ABxLab is accepted at @iclr_conf#ICLR 2026! ✨We ask: why do AI agents do what they do? 🧐
We introduce a framework for systematically studying AI agent behavior through controlled manipulations of their environments. We accomplish this by intercepting any real web environments and modifying what the agent sees in real time before they actually see it.
Work with Chengtian Ma, Abigail Xu, Maya Shaked, @pattiemaes, @nikhilsinghmus
🌐Web: https://t.co/04NDCCuyiZ
💻Code: https://t.co/KIkq28WEUe
📄Paper: https://t.co/sDr9LHqWqk
Would love to hear your thoughts!
The world is also full of visual cues 👀, and you might be wondering whether agents are sensitive to these as well. The answer is yes! Check out our new paper, where we introduce an optimization method for editing images to understand VLMs’ decisions:
https://t.co/Mpcx9P3GRG
Some decisions we make with our eyes 👀, but what about VLMs? Do they have structured, exploitable visual preferences that we can discover systematically before adversarial actors do?
In our new paper, we propose a new optimization method for this and show substantial effects on VLMs’ decisions.
Excited to (finally) share this paper, accepted at @iclr_conf#ICLR 2026! ✨
In this work, we use sparse autoencoders (SAEs) to study the internal representations of generative music models (here, MusicGen) and automatically discover how they encode concepts.
Some decisions we make with our eyes 👀, but what about VLMs? Do they have structured, exploitable visual preferences that we can discover systematically before adversarial actors do?
In our new paper, we propose a new optimization method for this and show substantial effects on VLMs’ decisions.
Do you see like an agent? Try it yourself: https://t.co/ETgo1Wr4tp
Paper: https://t.co/BTx8tnf3c4
Co-Authors: Pranav M R, Pattie Maes (@PattieMaes), Nikhil Singh (@nikhilsinghmus)
In our recent ICLR 2026 paper, we showed how to study other kinds of sensitivities in agent behavior by using counterfactuals with our new framework (ABxLab)
https://t.co/GCkEFGBgMr
How does it work? ABxLAB is a "man-in-the-middle" framework.
It intercepts web content in real-time to run controlled experiments on agents by modifying the choice architecture.
Think of it as a behavioral science lab for LLMs.
Paper: https://t.co/g6rdkg108n
🧵2/9