Excellent paper on why anthropomorphic misalignment research needs more precise concepts and stronger evidence.
One of their arguments, that a single label can obscure different operationalizations, is something we found empirically for sycophancy.
https://t.co/65lLtiQZGE
This is a wonderful articulation of the limitations of anthropomorphism for understanding + addressing model behavior! Our position paper discusses these assumptions across the LLM development pipeline, to be presented at ACL next month!
Really excited about this work! I think how models generalize alignment out of distribution will be increasingly important; positive alignment has the the potential to create huge benefits; and the results here are both great and a bit surprising. Check it out!
Cool to see that the Claude Code sessions we collected and released are now helping Anthropic study Claude Code 😎
Check out SWE-chat if you haven't! (more data coming soon...)
New w/ @AISecurityInst & @UniofOxford:
Frontier AI can now out-persuade expert humans in conversation - incl. world-champ debaters and professional canvassers.
This held even when humans chose their topics, prepared in advance, and competed for £1,000 prizes 🧵
Are AI Chatbots Harmful or Beneficial? It Depends What Would’ve Happened Otherwise
When thinking about whether using AI chatbots is harmful or beneficial, we should always be asking: “Compared to what?”
Link below 👇
We propose a new way to quantify AI overreliance: the Offloading Score 🧐 @vishakh_pk
It measures the fraction of cognitive work you hand off to AI 🤖 via simulating how you'd have done each step without AI, then counting the steps the AI saved. It works directly from interaction traces (keystrokes, screenshots), so it's reusable across many tools!!
Excited to share this paper, led by @vishakh_pk!
A very creative way to measure reliance on AI tools by showing, via simulating counterfactual workflows, how much cognitive effort has moved from the person to the tool.
People are increasingly worried that AI tools make us overreliant.
But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task.
In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not.
(1/9)
High reliance is not always undesirable.
We examine the interaction between reliance and a desirable task outcome, code understanding. While in-general high reliance leads to low code understanding, we also find a cluster of high-reliance + high-understanding users that are often _learning_ with AI and augmenting their own skills. This suggests that reliance should be interpreted alongside task outcomes, not in isolation.
(8/9)
People are increasingly worried that AI tools make us overreliant.
But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task.
In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not.
(1/9)
Training the models to be less sycophantic seems like a good idea, but I think the problem is that they’re not really smart enough for it yet, so they end up “pushing back” in ways that are awkward and annoying because they don’t make sense.
It’s been great to have spent a few months with @woj
setting up OAIF’s work on AI resilience / economic futures and getting ambitious about where this can go. Glad to have helped on this one.
I feel a huge amount of uncertainty on AI’s impacts and what to do about it. But I think giving people a pathway to shape their own futures, and not be subject to arbitrary power, is really important in all worlds. Econ is a huge part of that, democracy is a huge part of that, and they’re fundamentally entwined (e.g. it’s good that the state is funded via broad-based taxes, it’s good that competition enables plurality and keeps people’s choices relevant, etc.). There’s a ton that can be done now - better, real-time labor statistics, broad and deep qualitative data collection, pathways for sharing in AI’s value for countries around the world, lots of experiments that give people an actual stake in AI growth, investment in state capacity. Tons to do, more soon.
“AI, what color do I get from mixing black and white?”
Why do people turn to AI for simple tasks that they could easily do themselves?
In our new preprint (also to appear at CogSci 2026!), we investigate the mechanisms and dangers of people over-using AI on easy tasks.
We use "sycophancy" to describe not one thing but a family of model behaviors. That makes it hard to measure how "sycophantic" models really are, or compare results across studies.
This paper, led by @merylyemerylye, is a great effort at addressing this!
🚨 New preprint 🚨
We developed a sycophancy taxonomy based on prior literature and surveyed 106 experts.
94% agreed it's a serious problem. But they substantially disagreed about which behaviors actually count as sycophancy.
Thread 🧵(1/n)
🚨 New preprint 🚨
We developed a sycophancy taxonomy based on prior literature and surveyed 106 experts.
94% agreed it's a serious problem. But they substantially disagreed about which behaviors actually count as sycophancy.
Thread 🧵(1/n)
Construct fragmentation extends outside of academic research. How legislation describes sycophancy differs from how researchers and companies do.
If different stakeholders are targeting different behaviors, what does "less sycophantic" actually mean?