Harry Coppock @HarryCoppock - Twitter Profile

3 days ago

We're Neo Research (新衡). Asia’s first independent frontier AI safety evaluation & research lab. Today we're publishing our first report: an independent safety evaluation of DeepSeek v4 Pro. (1/5)

18

788

88

385

106K

HarryCoppock retweeted

Xander Davies

@alxndrdavies

11 days ago

I moved to London 3 years ago to join @AISecurityInst, at the time a few people with visitor passes and a whiteboard. Since then AISI has become the world’s largest and best-funded group in gov focused on AI security & safety. Fun to be in @nytimes!

alxndrdavies's tweet photo. I moved to London 3 years ago to join @AISecurityInst, at the time a few people with visitor passes and a whiteboard. Since then AISI has become the world’s largest and best-funded group in gov focused on AI security & safety. Fun to be in @nytimes! https://t.co/Mzod7wN2hk

6

379

38

76

17K

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

22 days ago

Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵

AISecurityInst's tweet photo. Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵 https://t.co/iudBoXys1e

31

575

126

185

137K

HarryCoppock retweeted

Aleksandr Bowkis @aleksandrbowkis

22 days ago

Can we safely automate alignment? Even if agents are not scheming, they can produce compelling research that survives extensive checks and strongly indicates that a model is safe but is catastrophically wrong. New paper from UK AISI: https://t.co/MsFTP7R4Mi

5

73

13

49

14K

Who to follow

Björn Schuller

@BjoernSchuller

Björn Schuller is Professor at TUM / Imperial College London & CSO of audEERING GmbH. His main interests are AI & Healthcare.

Konstantinos Barmpas

@NtinosBarmpas

Postdoc Researcher at @imperialcollege working on #ML #BCIs / ML Engineer at @CogitatLtd / PhD from @ICComputing / Previously: @ETH & @imperialeee

Alex Spies

@afspies

Remonstrating naughty AIs @farairesearch | Formerly PhD @ImperialCollege | AI Safety List @ https://t.co/ewgnBm32k7 | Hoping not to retire as a 📎

HarryCoppock retweeted

Tomek Korbak

@tomekkorbak

about 1 month ago

OpenAI introduces an additional layer of defense against misaligned or confused coding agents, complementing chain of thought monitoring we use internally. When Codex wants to execute a risky action outside of its sandbox, a separate Codex agent is asked to approve or deny it.

tomekkorbak's tweet photo. OpenAI introduces an additional layer of defense against misaligned or confused coding agents, complementing chain of thought monitoring we use internally. When Codex wants to execute a risky action outside of its sandbox, a separate Codex agent is asked to approve or deny it. https://t.co/B40TPq8adC

5

176

24

89

20K

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

about 1 month ago

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

95

2K

396

746

2M

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

about 1 month ago

As part of our work on assessing AI loss-of-control risks, we collaborated with @AnthropicAI to pilot alignment evals on models including pre-release snapshots of Mythos Preview and Opus 4.7. We ask: could an AI agent used inside a frontier lab sabotage safety research? 🧵

AISecurityInst's tweet photo. As part of our work on assessing AI loss-of-control risks, we collaborated with @AnthropicAI to pilot alignment evals on models including pre-release snapshots of Mythos Preview and Opus 4.7.

We ask: could an AI agent used inside a frontier lab sabotage safety research? 🧵 https://t.co/I2hjZajjYb

14

153

36

67

29K

HarryCoppock retweeted

Robert Kirk @_robertkirk

about 1 month ago

We evaluated Claude Mythos Preview, Opus 4.7 and other models with our updated alignment evaluation methodology, including a new continuation eval, improved evaluation and prefill awareness measurements. Details including new methodology in 🧵:

2

90

13

37

20K

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

about 1 month ago

We know AI systems occasionally act against their operators’ intentions – but what in their environment causes them to do so? In a new paper, we make progress on this question 🧵

AISecurityInst's tweet photo. We know AI systems occasionally act against their operators’ intentions – but what in their environment causes them to do so?

In a new paper, we make progress on this question 🧵 https://t.co/s6S5l2SxFd

13

103

25

59

14K

HarryCoppock retweeted

Alan Cooney

@Alan_Cooney_

about 1 month ago

Introducing vLLM-Lens: a fast interpretability tool that scales to trillion parameter models

17

671

48

420

42K

Harry Coppock @HarryCoppock

about 2 months ago

@thomasahle @AISecurityInst https://t.co/B2goRmfQEK So we don't count usage at the end over the trajectory. We log token usage per api call. Most model APIs give info on reasoning token usage.

1

0

8

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

about 2 months ago

We conducted cyber evaluations of Claude Mythos Preview and found that it is the first model to complete an AISI cyber range end-to-end. 🧵

AISecurityInst's tweet photo. We conducted cyber evaluations of Claude Mythos Preview and found that it is the first model to complete an AISI cyber range end-to-end. 🧵 https://t.co/gd9hi0Ve55

113

3K

550

1K

1M

HarryCoppock retweeted

Tomek Korbak

@tomekkorbak

about 2 months ago

OpenAI is spinning up an AI safety research fellowship program similar to MATS or Anthropic Fellows. People should apply!

2

453

21

269

75K

HarryCoppock retweeted

OpenAI

@OpenAI

about 2 months ago

Introducing the OpenAI Safety Fellowship, a new program supporting independent research on AI safety and alignment—and the next generation of talent. https://t.co/vAQKvf8KyO

384

3K

299

1K

947K

HarryCoppock retweeted

7vik @satvikgolechha

2 months ago

Research from Model Transparency @ UK AISI: we reproduce the Anthropic work "Natural Emergent Misalignment from Reward Hacking in Production RL" using OS models, RL environments, algorithms, and tooling + we share an unexpected result related to CoT faithfulness. 🧵 (1 of 7)

satvikgolechha's tweet photo. Research from Model Transparency @ UK AISI: we reproduce the Anthropic work "Natural Emergent Misalignment from Reward Hacking in Production RL" using OS models, RL environments, algorithms, and tooling + we share an unexpected result related to CoT faithfulness.

🧵 (1 of 7) https://t.co/d8dDAkkd8Z

3

182

25

115

22K

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

2 months ago

🔓 Can today’s AI agents escape sandbox environments? Using our new benchmark, SandboxEscapeBench, we find that frontier models can reliably exploit common vulnerabilities - and that breakout capability improves as model size and inference compute increase. Read more ⬇️

AISecurityInst's tweet photo. 🔓 Can today’s AI agents escape sandbox environments?

Using our new benchmark, SandboxEscapeBench, we find that frontier models can reliably exploit common vulnerabilities - and that breakout capability improves as model size and inference compute increase.

Read more ⬇️ https://t.co/rdKh8QPEyQ

9

156

35

81

17K

HarryCoppock retweeted

David @DavidDAfrica

3 months ago

Can LLMs tell when their conversation history has been tampered with? We tested 14 models across thousands of conversations to find out. Some new work from UK AISI 🧵

DavidDAfrica's tweet photo. Can LLMs tell when their conversation history has been tampered with? We tested 14 models across thousands of conversations to find out. Some new work from UK AISI 🧵 https://t.co/n21edhRpI3

10

161

17

75

16K

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

3 months ago

AI cyber capabilities are improving rapidly, but are evaluations keeping pace? Alongside @Irregular, we found that recent models can productively use 10-50x larger token budgets than typical evaluation settings allow, with key security implications🧵

AISecurityInst's tweet photo. AI cyber capabilities are improving rapidly, but are evaluations keeping pace?

Alongside @Irregular, we found that recent models can productively use 10-50x larger token budgets than typical evaluation settings allow, with key security implications🧵 https://t.co/phYPx8bzHu

2

70

13

30

20K

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

3 months ago

How can we make sense of the vast transcripts generated during agentic evaluations and multi-turn conversations? Together with @meridianlabs_ai, we built Inspect Scout, an open-source transcript analysis tool, and distilled best practices into a step-by-step pipeline🧵

AISecurityInst's tweet photo. How can we make sense of the vast transcripts generated during agentic evaluations and multi-turn conversations?

Together with @meridianlabs_ai, we built Inspect Scout, an open-source transcript analysis tool, and distilled best practices into a step-by-step pipeline🧵 https://t.co/nfBNb3fJxT

13

60

9

36

4K

HarryCoppock retweeted

AI Security Institute

@AISecurityInst

4 months ago

AI companies deploy safeguards that are robust to thousands of hours of human attacks. Today, we share Boundary Point Jailbreaking (BPJ), the first fully automated attack to break the safeguards of leading AI models🧵 (1/8)

AISecurityInst's tweet photo. AI companies deploy safeguards that are robust to thousands of hours of human attacks. Today, we share Boundary Point Jailbreaking (BPJ), the first fully automated attack to break the safeguards of leading AI models🧵 (1/8) https://t.co/rG6tveEzm6

6

149

33

94

35K

Harry Coppock

@HarryCoppock

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users