🚀 Rocket @rocketalignment - Twitter Profile

Pinned Tweet

🚀 Rocket @rocketalignment

9 months ago

I am losing the fight. If anyone has a space near a BART stop they could volunteer for a couple hours once a week, hmu

🚀 Rocket @rocketalignment

9 months ago

Fighting the fundamental need to start SF debate club

1

4

0

9K

0

13

0

1

8K

rocketalignment retweeted

Erin Woo @erinkwoo

about 7 hours ago

🚨some personal news: i am moving to the job many of you assumed i already had🚨 i am now covering openai for @theinformation! for the next few months i'll be writing a lot about the ipo, but i'm interested longer term in safety, policy and ai culture, inside and outside of sf.

8

200

6

20

19K

rocketalignment retweeted

jasmine @jasminexli

about 8 hours ago

@theinformation profiled @Turn_Trout's and my work on eval cooperativeness!

3

40

5

3

559

🚀 Rocket @rocketalignment

about 8 hours ago

New research on eval awareness

The Information

@theinformation

1 day ago

Researchers are racing to solve a new AI challenge known as eval awareness. As models become more sophisticated, they are getting better at recognizing evaluations and may behave differently during them. Read more: https://t.co/TgpvM5lmR4

4

8

1

0

2K

0

3

0

1

123

🚀 Rocket @rocketalignment

about 8 hours ago

Internal benchmarks -> better router -> better subagents

The Information

@theinformation

1 day ago

Cognition is overhauling Windsurf into Devin Desktop, a hub where developers can manage AI coding agents from OpenAI, Anthropic and others. The strategy positions Cognition as a neutral platform in a market increasingly dominated by model providers. Full story: https://t.co/ZmPZ4t1PKJ

7

115

7

30

19K

0

92

🚀 Rocket @rocketalignment

about 8 hours ago

Eval awareness is also relevant for capability evals but seems more problematic for propensity evals

The Information

@theinformation

1 day ago

Frontier AI model safety benchmarks are breaking down due to self-aware models, @rocketalignment reports. "We're finding out that the models as they're getting smarter are getting better at detecting when they're being evaluated, when they're in a test."

1

12

3

1

5K

0

6

1

0

263

🚀 Rocket @rocketalignment

1 day ago

@KevinTFrazier @MATSprogram Thanks Kevin!! 😁

0

2

0

153

rocketalignment retweeted

Kevin Frazier

@KevinTFrazier

1 day ago

You should really follow @rocketalignment. You don’t wanna miss gems like this dive into exciting research by @MATSprogram.

KevinTFrazier's tweet photo. You should really follow @rocketalignment.

You don’t wanna miss gems like this dive into exciting research by @MATSprogram. https://t.co/ffqPwtP0uC

1

14

4

1

789

rocketalignment retweeted

Changling Li

@ChanglingXavier

1 day ago

Our work on Decomposing and Measuring Evaluation Awareness was covered by @theinformation. Thanks @rocketalignment for the write-up! We position this work as the foundational reference for studying evaluation awareness, providing a unified definition and decomposition, empirical baselines across nine frontier models and four benchmarks, and a controlled benchmark for exploring solutions. Newsletter and paper in thread 🧵

1

21

7

4

859

🚀 Rocket @rocketalignment

2 days ago

It’s Fort Knox in here

1

8

0

266

🚀 Rocket @rocketalignment

2 days ago

@deanwball Seems a little disingenuous. Don’t the labs have roughly both of these positions?

1

9

0

263

🚀 Rocket @rocketalignment

2 days ago

And you thought Elon's lawsuit was dramatic

0

12

0

1

430

🚀 Rocket @rocketalignment

5 days ago

And “from the back porch of my mind” from Bright Eyes. Of course they walked so Zach Bryan could run

0

94

🚀 Rocket @rocketalignment

18 days ago

Was talking to someone about goblins and RLHF artifacts. Got to thinking about what would reward hack our own poetry RMs I'm a sucker for lines like - on the porch swing of my mind - down the hallways of my mind - I'd like to walk around in your mind TIL these are eyeball kicks

rocketalignment's tweet photo. Was talking to someone about goblins and RLHF artifacts. Got to thinking about what would reward hack our own poetry RMs

I'm a sucker for lines like
- on the porch swing of my mind
- down the hallways of my mind
- I'd like to walk around in your mind

TIL these are eyeball kicks https://t.co/glQUYNFjrW

1

0

1K

🚀 Rocket @rocketalignment

18 days ago

Lyrics from Zach Bryan, Alela Diane, Vashti Bunyan Post from Nostalgebraist https://t.co/XRyTSXVh2k

1

0

207

🚀 Rocket @rocketalignment

6 days ago

@euan_ong @gleech @andyw_ais Did you already try initializing NLAs on the text *following* or around that token?

0

1

0

185

🚀 Rocket @rocketalignment

6 days ago

@tszzl There are also ways it would be good for humanity

0

77

🚀 Rocket @rocketalignment

7 days ago

@AlexanderTw33ts When will it be my turn to be chosen :(

0

208

rocketalignment retweeted

Earth Is A Sales Funnel For SATAN

@GENIC0N

9 days ago

in my defense your honor, I was being acausally coerced by the supercomputer at the end of time

19

1K

119

109

40K

rocketalignment retweeted

Chris Paxton

@chris_j_paxton

9 days ago

Memory might be the most important outstanding problem for modeling + learning alone; there are other key issues like tactile/multimodal but those require hardware and data collection innovation. We should be able to solve memory *now.* Cool to see a benchmark targeting it!

5

94

15

44

14K