steven @stevenlu0 - Twitter Profile

1 day ago

For more details on our other findings, such as alignment between different concepts of approval, divergence of issue sides, trust, and more, be sure to check out @jonathanstray's thread 🧵https://t.co/iazfHeROPa

Jonathan Stray

@jonathanstray

4 days ago

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset. We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced. 1/🧵

jonathanstray's tweet photo. What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.

We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.

1/🧵 https://t.co/OeS1dgEcma

6

52

15

36

21K

0

3

1

0

209

steven @stevenlu0

1 day ago

How should an AI model respond to a politically charged question? ⚡️ We propose an empirically testable definition of AI political neutrality & collect 200k+ human evaluations, finding that people on opposing sides of contentious issues can highly approve of the same responses!

stevenlu0's tweet photo. How should an AI model respond to a politically charged question? ⚡️

We propose an empirically testable definition of AI political neutrality & collect 200k+ human evaluations, finding that people on opposing sides of contentious issues can highly approve of the same responses! https://t.co/9hvpytOdru

2

8

2

0

262

steven @stevenlu0

1 day ago

I had an amazing time working with @jonathanstray @davidzhaiyang Miu @serinachang5 at @berkeley_ai @CHAI_Berkeley over the past few months and I learned so much from everyone on the team! Check out our full preprint ⬇️ Paper: https://t.co/cAQb0bsXDL Data: https://t.co/od7MVRzixj

1

6

1

0

84

stevenlu0 retweeted

Serina Chang @serinachang5

1 day ago

When people strongly disagree on an issue, can they agree on what makes a good AI response? We find: yes, more than you might expect! We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues 🧵

serinachang5's tweet photo. When people strongly disagree on an issue, can they agree on what makes a good AI response?

We find: yes, more than you might expect!

We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues 🧵 https://t.co/KkRjpimme0

4

91

16

51

8K

stevenlu0 retweeted

Jonathan Stray

@jonathanstray

4 days ago

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset. We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced. 1/🧵

6

52

15

36

21K

stevenlu0 retweeted

Sarah Cen

@cen_sarah

13 days ago

To prove an AI developer or deployer broke the law, you need evidence. But what happens when the evidence needed to prove a claim is hidden inside proprietary models, platform logs, protected databases, or internal documentation? Our paper explores barriers to evidence in AI-related litigation. We study past and ongoing cases + propose a legal test for evidence decisions ⬇️ (1/7)

cen_sarah's tweet photo. To prove an AI developer or deployer broke the law, you need evidence. But what happens when the evidence needed to prove a claim is hidden inside proprietary models, platform logs, protected databases, or internal documentation?

Our paper explores barriers to evidence in AI-related litigation.

We study past and ongoing cases + propose a legal test for evidence decisions ⬇️

(1/7)

4

57

21

7K

steven @stevenlu0

8 days ago

life update: officially a berkeley graduate! #gobears 🐻

2

12

0

337

steven @stevenlu0

12 days ago

learned what a jira ticket is this week and my life hasn’t been the same 🫩

0

1

0

73

steven @stevenlu0

17 days ago

@_tenZdhon_ very unfortunately i spent probably north of $300 on regalia rental + commencement guest tickets…

0

1

0

104

steven @stevenlu0

17 days ago

@merylyemerylye reading this asap 🏃

0

1

0

44

stevenlu0 retweeted

Meryl Ye @merylyemerylye

19 days ago

🚨 New preprint 🚨 We developed a sycophancy taxonomy based on prior literature and surveyed 106 experts. 94% agreed it's a serious problem. But they substantially disagreed about which behaviors actually count as sycophancy. Thread 🧵(1/n)

merylyemerylye's tweet photo. 🚨 New preprint 🚨

We developed a sycophancy taxonomy based on prior literature and surveyed 106 experts.

94% agreed it's a serious problem. But they substantially disagreed about which behaviors actually count as sycophancy.

Thread 🧵(1/n) https://t.co/AeLPjOJ748

3

42

14

23

12K

steven @stevenlu0

20 days ago

@jocelynjshen you literally cooked holy

1

0

45

stevenlu0 retweeted

Serina Chang @serinachang5

28 days ago

User simulators have emerged as promising tools for building interactive AI, but what makes a “good” simulator? We reframe the problem as what creates downstream value for humans Our new simulator test: how an LLM assistant trained with the simulator performs with human users🧵

serinachang5's tweet photo. User simulators have emerged as promising tools for building interactive AI, but what makes a “good” simulator?

We reframe the problem as what creates downstream value for humans

Our new simulator test: how an LLM assistant trained with the simulator performs with human users🧵 https://t.co/Nhf4Bz7U74

6

133

23

81

15K

steven @stevenlu0

29 days ago

does anyone with too much time on their hands tomorrow want go to oakland and try to watch musk v altman get argued in court…

0

4

0

225

stevenlu0 retweeted

Ken Liu

@kenziyuliu

about 1 month ago

Had a great time discussing AI user privacy on @augmind_fm 😃 One discussion I’d like to highlight from the chat is that what constitutes the "Privacy Problem" has been shifting as AI progresses. It used to be that we care a lot about *training-time* user privacy: what gets trained into the model, and what the model would spit out. Say you take an LLM and a book (or any piece of sensitive text). We cared about whether the book would be regurgitated ("memorization"); whether you can remove such a book from the model ("unlearning"); and whether you can detect the book being trained ("membership inference"). And as part of mitigating these problems, we work on training-time techniques like differential privacy, careful data cleaning, and model alignment/guardrails (in ~increasing order of adoption). Guardrails seem to work well enough that people don’t really talk about sensitive model outputs anymore. What’s more pressing today, I argue, is *inference-time* user privacy: the fact that intelligent models are served at scale on private user data, which are then centrally managed at model providers. Intelligent models mean that user profiling is now cheap and automatic; your activities can be continuously analyzed to reveal new sensitive insights. Whether your data is trained on or not became less relevant. Having a "digital clone" of you by building on your memory/personalization is now way more profitable. The threat vector changed from the model misbehaving to the provider misbehaving. Because of this, the techniques to improve user privacy would look different than before. They’ll look less like fancy learning algorithms (e.g. RL to steer model to output paraphrase of a book than the original book), and more like *peripheral systems* sitting around closed models that we do not control but still want to access. The OA project (https://t.co/rOAoavIavT) is an example: you could build a zero-knowledge proxy to mediate AI inference and combat surveillance, and leverage smaller models to help users build personal memory on-device. This is not to say that there’s no room for training; you just train for different things, and on auxiliary models than the closed models. thank you so much to @EchoShao8899 @michaelryan207 @shannonzshen for hosting me!

2

32

6

9

8K

steven @stevenlu0

about 1 month ago

who’s up submitting to neurips 🔥🔥🔥

0

17

0

2K

steven @stevenlu0

about 1 month ago

@abby_k_oneill @berkeley_ai the user study was so cool!!

0

1

0

109

steven @stevenlu0

about 1 month ago

full schedule & registration: https://t.co/7y7nOAjIGa

0

52

steven @stevenlu0

about 1 month ago

i'm co-organizing a workshop on AI governance! we'll have student presentations in the morning, then various presentations in the afternoon ft. CA State Sen. Jerry McNerney, Prof. Suresh Venkatasubramanian, speakers from DeepMind, CCST, Mila & more! register for free food 😋

stevenlu0's tweet photo. i'm co-organizing a workshop on AI governance!

we'll have student presentations in the morning, then various presentations in the afternoon ft. CA State Sen. Jerry McNerney, Prof. Suresh Venkatasubramanian, speakers from DeepMind, CCST, Mila & more! register for free food 😋 https://t.co/pce6tXMalw

1

9

0

1

182

steven

@stevenlu0

Last Seen Users on Sotwe

Trends for you

Most Popular Users