For more details on our other findings, such as alignment between different concepts of approval, divergence of issue sides, trust, and more, be sure to check out @jonathanstray's thread 🧵https://t.co/iazfHeROPa
What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.
We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.
1/🧵
How should an AI model respond to a politically charged question? ⚡️
We propose an empirically testable definition of AI political neutrality & collect 200k+ human evaluations, finding that people on opposing sides of contentious issues can highly approve of the same responses!
I had an amazing time working with @jonathanstray@davidzhaiyang Miu @serinachang5 at @berkeley_ai@CHAI_Berkeley over the past few months and I learned so much from everyone on the team!
Check out our full preprint ⬇️
Paper: https://t.co/cAQb0bsXDL
Data: https://t.co/od7MVRzixj
When people strongly disagree on an issue, can they agree on what makes a good AI response?
We find: yes, more than you might expect!
We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues 🧵
What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.
We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.
1/🧵
To prove an AI developer or deployer broke the law, you need evidence. But what happens when the evidence needed to prove a claim is hidden inside proprietary models, platform logs, protected databases, or internal documentation?
Our paper explores barriers to evidence in AI-related litigation.
We study past and ongoing cases + propose a legal test for evidence decisions ⬇️
(1/7)
🚨 New preprint 🚨
We developed a sycophancy taxonomy based on prior literature and surveyed 106 experts.
94% agreed it's a serious problem. But they substantially disagreed about which behaviors actually count as sycophancy.
Thread 🧵(1/n)
User simulators have emerged as promising tools for building interactive AI, but what makes a “good” simulator?
We reframe the problem as what creates downstream value for humans
Our new simulator test: how an LLM assistant trained with the simulator performs with human users🧵
Had a great time discussing AI user privacy on @augmind_fm 😃
One discussion I’d like to highlight from the chat is that what constitutes the "Privacy Problem" has been shifting as AI progresses.
It used to be that we care a lot about *training-time* user privacy: what gets trained into the model, and what the model would spit out. Say you take an LLM and a book (or any piece of sensitive text). We cared about whether the book would be regurgitated ("memorization"); whether you can remove such a book from the model ("unlearning"); and whether you can detect the book being trained ("membership inference"). And as part of mitigating these problems, we work on training-time techniques like differential privacy, careful data cleaning, and model alignment/guardrails (in ~increasing order of adoption). Guardrails seem to work well enough that people don’t really talk about sensitive model outputs anymore.
What’s more pressing today, I argue, is *inference-time* user privacy: the fact that intelligent models are served at scale on private user data, which are then centrally managed at model providers. Intelligent models mean that user profiling is now cheap and automatic; your activities can be continuously analyzed to reveal new sensitive insights. Whether your data is trained on or not became less relevant. Having a "digital clone" of you by building on your memory/personalization is now way more profitable. The threat vector changed from the model misbehaving to the provider misbehaving.
Because of this, the techniques to improve user privacy would look different than before. They’ll look less like fancy learning algorithms (e.g. RL to steer model to output paraphrase of a book than the original book), and more like *peripheral systems* sitting around closed models that we do not control but still want to access. The OA project (https://t.co/rOAoavIavT) is an example: you could build a zero-knowledge proxy to mediate AI inference and combat surveillance, and leverage smaller models to help users build personal memory on-device. This is not to say that there’s no room for training; you just train for different things, and on auxiliary models than the closed models.
thank you so much to @EchoShao8899@michaelryan207@shannonzshen for hosting me!
i'm co-organizing a workshop on AI governance!
we'll have student presentations in the morning, then various presentations in the afternoon ft. CA State Sen. Jerry McNerney, Prof. Suresh Venkatasubramanian, speakers from DeepMind, CCST, Mila & more! register for free food 😋