AS @agstrait - Twitter Profile

3 days ago

@S_OhEigeartaigh Congrats Seán!!!

0

1

0

103

agstrait retweeted

22 days ago

Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵

AISecurityInst's tweet photo. Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵 https://t.co/iudBoXys1e

31

575

126

185

137K

agstrait retweeted

Sandra Wachter [email protected]

about 1 month ago

We know AI systems occasionally act against their operators’ intentions – but what in their environment causes them to do so? In a new paper, we make progress on this question 🧵

AISecurityInst's tweet photo. We know AI systems occasionally act against their operators’ intentions – but what in their environment causes them to do so?

In a new paper, we make progress on this question 🧵 https://t.co/s6S5l2SxFd

13

103

25

58

14K

Who to follow

@SandraWachter5

Professor of Technology & Regulation, Oxford Internet Institute, University of Oxford Humboldt Professor of Technology & Regulation, Hasso Plattner Institute

Jessica Schrouff

@JessicaSchrouff

Director of Responsible AI @GSK. Interested in trustworthy ML, causality, health, DEI. She/her. Prev: DeepMind, Google, Stanford, UCL

Ang Li

@angli_ai

CEO @SimularAI | Creating autonomous computers @sai_borg | Former @GoogleDeepMind Research Scientist | The future of digital experiences

agstrait retweeted

Nate

@NateBurnikell

about 1 month ago

We (@AISecurityInst) tested GPT-5.5 for its cyber capabilities and safeguards. It's the strongest performing model we've tested on our narrow cyber tasks and solved one of our cyber ranges in 1/10 attempts. We found a universal jailbreak with 6 hours of expert red teaming.

NateBurnikell's tweet photo. We (@AISecurityInst) tested GPT-5.5 for its cyber capabilities and safeguards. It's the strongest performing model we've tested on our narrow cyber tasks and solved one of our cyber ranges in 1/10 attempts. We found a universal jailbreak with 6 hours of expert red teaming. https://t.co/xXt67MBTbb

17

372

55

140

51K

agstrait retweeted

Jared Moore @jaredlcm

3 months ago

Disturbing anecdotal reports of "AI psychosis" and negative psychological effects have been emerging in the news. But what actually happens during these lengthy delusional "spirals"? In our preprint, we analyze chat logs from 19 users who experienced severe psychological harm🧵👇

24

393

81

335

53K

agstrait retweeted

Cas (Stephen Casper)

@StephenLCasper

6 months ago

Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI? This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this.

StephenLCasper's tweet photo. Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI?

This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this. https://t.co/HXRAMEiKEL

1

39

11

18

8K

6 months ago

🤝 You��ll work with 2 other researchers and in collaboration with other gov departments. The first project is to create a problem book of methods to reduce these risks (building on https://t.co/rLBqhCZSZy). Crucially, you are not expected to view sensitive material directly.

0

25

summerfieldlab @summerfieldlab.bsky.social @summerfieldlab

6 months ago

🛠️ This is a technical role for an applied ML or security engineer. The work we anticipate could include building scalable ways to detect malicious LoRAs, exploring data filtering and other methods for reducing malicious fine-tuning, and other technical methods.

1

0

28

agstrait retweeted

11 months ago

In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.

summerfieldlab's tweet photo. In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed. https://t.co/n7W8qyY27n

4

84

25

32

17K

agstrait retweeted

Saffron Huang

@saffronhuang

12 months ago

Newest @reboot_hq 🎙️post: @jessicadai_ and I discuss forecasting, and how people present unhelpful narratives about the future (mostly by picking on AI 2027, sorry guys) Why we should view the future as constructed, not predicted

saffronhuang's tweet photo. Newest @reboot_hq 🎙️post: @jessicadai_ and I discuss forecasting, and how people present unhelpful narratives about the future (mostly by picking on AI 2027, sorry guys)

Why we should view the future as constructed, not predicted https://t.co/kEvF0IbFgf

4

56

11

24

4K

agstrait retweeted

Josh Wolfe

@wolfejosh

12 months ago

Apple just GaryMarcus'd LLM reasoning ability

217

5K

561

5K

3M

agstrait retweeted

about 1 year ago

Advanced AI systems require complex evaluations to measure abilities, but conventional analysis techniques often fall short. Introducing HiBayES: a flexible, robust statistical modelling framework that accounts for the nuances & hierarchical structure of advanced evaluations.

AISecurityInst's tweet photo. Advanced AI systems require complex evaluations to measure abilities, but conventional analysis techniques often fall short.
Introducing HiBayES: a flexible, robust statistical modelling framework that accounts for the nuances & hierarchical structure of advanced evaluations. https://t.co/DO27LNwn1c

2

52

11

25

7K

agstrait retweeted

Sayash Kapoor @sayashk

about 1 year ago

How will AI impact the economy? Can we defend against misuse? What policies would mitigate the risks of AI? Thrilled to share that @random_walker and I are writing another book to tackle these questions! Today, we release a paper laying out our argument: AI as Normal Technology.

sayashk's tweet photo. How will AI impact the economy? Can we defend against misuse? What policies would mitigate the risks of AI?

Thrilled to share that @random_walker and I are writing another book to tackle these questions! Today, we release a paper laying out our argument: AI as Normal Technology. https://t.co/a2dxNVu0J0

12

282

69

183

58K

Billy Perrigo @billyperrigo

about 1 year ago

I too find this really weird, mainly in that it shows the frontier of AI research is at risk of moving further away from producing useful, safe, reliable products. These seem like features, not bugs.

about 1 year ago

nice analogy from @jackclarkSF newsletter this week

10

257

38

57

28K

0

2

0

1

258

agstrait retweeted

about 1 year ago

We've funded 20 new research projects to enhance AI security in critical infrastructure ⚡ Our Systemic AI Safety Grants Programme, announced at the Seoul AI Summit, has awarded up to £200,000 seed grants to projects tackling AI risks 🧵👇

1

58

10

16

10K

about 1 year ago

A great thread re: problematic extrapolations on claims about AI being superhuman at tasks. 1. Coding =/= all computer-related tasks, let alone all tasks 2. Generating code to complete a task =/= the most efficient, secure way to complete a task.

Natália 🔍

@natalia__coelho

about 1 year ago

This tweet is misleading. State-of-the-art AI models struggle at some tasks that take humans <10 minutes, while *simultaneously* excelling at some tasks that would take humans several hours or days to solve.

6

280

27

76

39K

0

1

0

299

about 1 year ago

@peterwildeford Presumably they'll be similar kinds of answers as the last industrial revolutions, i.e. social innovations like labour protections, the 5 day work week, etc. that balanced societal interests with the interests of employers?

1

2

0

57

about 1 year ago

These include undesirable automation, over-reliance on AI systems, mental health impacts, mass generation of unreliable content, power concentration, and social destabilisation...and so much more.

0

41