Benjamin Weinstein-Raun @benwr - Twitter Profile

Benjamin Weinstein-Raun @benwr

4 days ago

@freed_dfilan TIL this is the name of a place in upstate NY!

0

1

0

14

benwr retweeted

Eli Tyre

@EpistemicHope

4 days ago

I work in AI policy full-time. I'm taking time off work, volunteering 12+ hours a day to get Alex Bores elected. If you're a Democrat in Manhattan, I'd love talk with you for 15 min about where AI is heading, AI risks, and why we urgently need voices like Alex in Congress. sign up here: https://t.co/ZQiWlTvEQe (All opinions are my own and do not represent the views of my employer, a 501c3 that does not endorse political candidates.)

2

168

29

18

37K

Benjamin Weinstein-Raun @benwr

7 days ago

@NathanpmYoung Basically "no", imo. It matters a lot that jailbreaks require a human to try to interfere with the developer's intentions. The alignment problem is about what the system itself is trying to do; even if you perfectly prevent jailbreaks a misaligned model will seek undesirable ends

0

61

Benjamin Weinstein-Raun @benwr

7 days ago

Tentative: "you're not that guy" is the purest insult devised so far.

0

18

Benjamin Weinstein-Raun @benwr

9 days ago

@ascherlis Nice

0

2

0

20

Benjamin Weinstein-Raun @benwr

9 days ago

Do you know the laws of the heavens?

1

0

36

Benjamin Weinstein-Raun @benwr

9 days ago

@kave_rennedy On the other hand, it does seem clear to me that most people are not in fact optimizing most things they have control over very much, and are prone to assume that a thing is best when (e.g.) it merely *was* best

0

20

Benjamin Weinstein-Raun @benwr

14 days ago

it's so weird that the pope is american now. the pope should be like going to the opera: incomprehensible / a modern incarnation of ancient europe

0

12

Benjamin Weinstein-Raun @benwr

17 days ago

@freed_dfilan specific exchange that triggered this: https://t.co/jwnSBbZ70x It's possible I'm misunderstanding deepfates, but I have a sense of this pattern from at least 4-5 prominent conversations here

🎭

@deepfates

17 days ago

@RyanPGreenblatt I see that you're trying to calibrate probabilities to use rationality. I appreciate that. However, we are not a car

0

4

0

193

1

0

14

Benjamin Weinstein-Raun @benwr

17 days ago

Why is it that a certain brand of "post-rationality" (e.g. "you can't use ordinary reasoning to think about humanity and what we might collectively do") seems to almost exclusively get deployed around AI stuff? Feels suspicious to me. Like, where was this PoV during early covid?

1

0

31

Benjamin Weinstein-Raun @benwr

30 days ago

This is surprising to me! Usually I appreciate people poking fun at a trope or meme I dislike!

0

16

Benjamin Weinstein-Raun @benwr

30 days ago

So uh, literally every time I've seen someone intercalating their statement with "👏" emojis, I have both disagreed with the thing they were saying and felt annoyed at them. Even when they were doing it in a way that was meant to be tongue in cheek.

1

0

23

Benjamin Weinstein-Raun @benwr

about 1 month ago

@HumanHarlan There are also way more mundane reasons for people to be lying to each other within the admin, e.g. "oh, yeah, I totally got input from them like I said I would"

0

3

0

456

Benjamin Weinstein-Raun @benwr

about 1 month ago

@QiaochuYuan I have been pretty shocked by what grok appears to be fine with writing

0

37

Benjamin Weinstein-Raun @benwr

about 1 month ago

Elephant inside snake crossing

0

1

0

15

Benjamin Weinstein-Raun @benwr

about 1 month ago

@freed_dfilan Ah I see, yeah I misinterpreted your goal in pointing this out

1

0

16

Benjamin Weinstein-Raun @benwr

about 1 month ago

@freed_dfilan (also: - potentially easier to prevent - maybe costlier per instance of knowledge - doesn't work for "unknown unknowns", i.e. knowing about something makes the existence of that knowledge and its object way more salient)

1

2

0

39

Benjamin Weinstein-Raun @benwr

about 1 month ago

@freed_dfilan Totally, though obviously if it's looking something up it's usually going to be way more legible from the outside, vs when it knows those things from training

1

2

0

63

Benjamin Weinstein-Raun @benwr

about 1 month ago

@RyanPGreenblatt Wait, really? I feel like your list of bullet points is ~ exhaustive re my sense of the question, and I'm curious what other bullet points would shift it to "yes"; my guess is "clearly being takeover" alone might be sufficiently evil-coded? But LLMs *really* love finishing tasks!

0

18

Benjamin Weinstein-Raun

@benwr

Last Seen Users on Sotwe

Trends for you

Most Popular Users