Asa Cooper Stickland

Verified account

@AsaCoopStick

Poking language models @ the UK AI Security Institute

London, England

Joined December 2017

994 Following

1.8K Followers

1.5K Posts

Pinned Tweet

Asa Cooper Stickland

7 months ago

NEW PAPER: Could an LLM agent subtly sabotage your code? We conducted a red-blue team game where the red team designed agents to sabotage, and the blue team designed monitors to catch the agent. Three surprising results ahead 🧵🛳️

AsaCoopStick's tweet photo. NEW PAPER: Could an LLM agent subtly sabotage your code?

We conducted a red-blue team game where the red team designed agents to sabotage, and the blue team designed monitors to catch the agent.

Three surprising results ahead 🧵🛳️ https://t.co/f50xgMBPxq

AI Security Institute

@AISecurityInst

7 months ago

AI coding agents are increasingly writing production code - with tools, file access, and execution permissions. That power accelerates development, but also introduces new security risks if agents act against user intent 🧵

AISecurityInst's tweet photo. AI coding agents are increasingly writing production code - with tools, file access, and execution permissions.

That power accelerates development, but also introduces new security risks if agents act against user intent 🧵 https://t.co/K8Qs871ipw

1

47

7

17

16K

2

95

15

33

48K

Asa Cooper Stickland

about 3 hours ago

.@soylent please come back to the UK we need you 🙏

about 3 hours ago

@AsaCoopStick @jjspicer May we never reach utopia

1

3

0

0

438

0

3

0

0

369

Asa Cooper Stickland

about 6 hours ago

If I ever give you a bad answer to a cyber-adjacent question that's because a dumber guy swaps in for those

0

15

0

0

557

Asa Cooper Stickland

2 days ago

@opheliamoding @funplings lol yeah I did a bit of googling which said as much and realised maybe my intuition was coming from the people I meet which will probably select from roughly the same demographics as vibecamp

0

2

0

0

49

Who to follow

PhD, CDT in NLP, University of Edinburgh. Prev: IIT Madras | University of Mumbai. She/her.

Verified account

I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.

Research Scientist @Deepmind -- Solving AI one step at a time -- He/Him

Asa Cooper Stickland

2 days ago

@jjspicer I don't think EAs are imagining people still drink huel in the glorious utopia

1

6

0

0

272

Asa Cooper Stickland

2 days ago

@underthenettle Extremely good plan

0

1

0

0

59

Asa Cooper Stickland

3 days ago

GO TO STRAWBERRY HILL HOUSE

AsaCoopStick's tweet photo. GO TO STRAWBERRY HILL HOUSE https://t.co/L3ZoWZnJRn

AsaCoopStick's tweet photo. GO TO STRAWBERRY HILL HOUSE https://t.co/L3ZoWZnJRn

AsaCoopStick's tweet photo. GO TO STRAWBERRY HILL HOUSE https://t.co/L3ZoWZnJRn

AsaCoopStick's tweet photo. GO TO STRAWBERRY HILL HOUSE https://t.co/L3ZoWZnJRn

5

36

0

5

2K

Asa Cooper Stickland

2 days ago

So no lab has hired Raj Chetty yet? Are the recruiters all on holiday or what?

0

3

0

1

1K

Asa Cooper Stickland

2 days ago

@politicalmath I don't get the reasoning here. Like it genuinely bounces off my brain, can you expand?

0

0

0

0

10

Asa Cooper Stickland

3 days ago

@draaglom I actually would have hit you up for a tour etc but was a family trip lol

1

1

0

0

84

Asa Cooper Stickland

3 days ago

@sienna_rothery I should make the strawberry hill house partiful...

1

0

0

0

70

Asa Cooper Stickland

3 days ago

@WillowChem Gothic revival architecture, slightly insane vibes, weird cool shit in every room...

0

1

0

0

96

Asa Cooper Stickland

7 days ago

Nice work on v hard problem

Mikhail Terekhov @MiTerekhov

7 days ago

Our Anthropic Fellows project is now public! The labs are planning to hand off AI safety research to AIs, but can we trust these AIs? We explore a way to control them for "fuzzy" tasks like writing research proposals. This is a whole new direction in diffuse AI control!

MiTerekhov's tweet photo. Our Anthropic Fellows project is now public!

The labs are planning to hand off AI safety research to AIs, but can we trust these AIs? We explore a way to control them for "fuzzy" tasks like writing research proposals. This is a whole new direction in diffuse AI control! https://t.co/0531egepar

5

242

19

146

19K

0

20

0

15

3K

Asa Cooper Stickland

8 days ago

@caryatis > i don’t have a speech describing myself ready to go why not? surely you get similar ish questions often enough?

0

2

0

0

378

Asa Cooper Stickland

9 days ago

@yong_zhengxin 👀

0

1

0

0

42

Asa Cooper Stickland

9 days ago

I feel somewhat worried about AI safety as a whole optimising for empirical work/solutions that work for current models. I don't really care about e.g. decision theory, but I think general macrostrategy/what do we do with aligned or "almost-aligned" AGI is v underinvested in

4

33

1

3

2K

Asa Cooper Stickland

9 days ago

Hmm seems good https://t.co/utwTTQQOqg

Iliad @Iliad_research

9 days ago

We are Iliad, and we do research and fieldbuilding to develop AI safety as a theoretical and experimental science. Learn more about us below! 🧵

1

87

10

42

11K

0

3

1

0

477

Asa Cooper Stickland

9 days ago

Back in the day I was long empirics, but I think we've managed to successfully scale empirical/near-term AI safety really well, and not so much future-facing AI safety. Another factor is the early advocates of theory were just crazy lesswrongers (much love to u guys) which made working with them less attractive to like "average cracked engineer/PhD student" and imo just meant they had less useful ideas. Feels like a new crop of more mainstream futurists has not emerged though

1

11

0

1

559

Asa Cooper Stickland

9 days ago

@yong_zhengxin (and I guess important to create maximally-subtly misaligned models to test out any e.g. scalable oversight or "automate ai safety research" schemes)

1

1

0

0

49

Asa Cooper Stickland

9 days ago

@yong_zhengxin Not really future-facing, but this is my favourite neglected non-future-facing area!

1

1

0

0

115

Last Seen Users on Sotwe

Trends for you

Most Popular Users