James Oldfield @jamesaoldfield - Twitter Profile

Pinned Tweet

4 months ago

Excited to share that our work on Dynamic Safety Monitoring for Language Models is accepted at ICLR 2026!! Looking forward to chatting with people there :) Thanks a lot to @philiptorr @ioannispatras @Adel_Bibi @FazlBarez !!

James Oldfield @jamesaoldfield

9 months ago

How can we efficiently monitor LLMs for safety? Strong monitors waste compute on easy inputs, but lightweight probes risk missing harms ⚠️ 𝙏𝙧𝙪𝙣𝙘𝙖𝙩𝙚𝙙 𝙥𝙤𝙡𝙮𝙣𝙤𝙢𝙞𝙖𝙡 𝙘𝙡𝙖𝙨𝙨𝙞𝙛𝙞𝙚𝙧𝙨 (TPCs) address this by generalizing linear probes for dynamic monitoring! 💫

2

40

11

18

10K

2

42

2

16

6K

James Oldfield @jamesaoldfield

5 days ago

@Turn_Trout The OG strategy

0

31

0

1

2K

jamesaoldfield retweeted

William MacAskill

@willmacaskill

about 1 month ago

I had a very fun and wid-ranging conversation with @campbellclaret and @RoryStewartUK for The Rest is Politics: Leading, including around what European countries should do to avoid disempowerment in the face of AI progress. One of the most lively conversations I've had in many years! Link below.

willmacaskill's tweet photo. I had a very fun and wid-ranging conversation with @campbellclaret and @RoryStewartUK for The Rest is Politics: Leading, including around what European countries should do to avoid disempowerment in the face of AI progress.

One of the most lively conversations I've had in many years! Link below.

8

165

18

47

54K

jamesaoldfield retweeted

Shawn Im @shawnim00

about 2 months ago

I’ll be in Brazil for ICLR! 🇧🇷 I’ll be talking about how we can use theory to interpret models during the Thursday morning poster session and afternoon oral session! (Oral in 201 A/B, P4-#4006) Happy to talk about interp, theory, or other things! Send a DM!

1

24

4

3

2K

Who to follow

Andrew Campbell

@AndrewC_ML

Research Scientist, Google DeepMind. Previous: @Xaira_Thera, PhD @oxcsml

Katrin Renz

@KatrinRenz

Founder | Physical AI Previously PhD @IMPRS & Uni Tübingen | Intern @wayve_ai @Oxford_VGG

Sagar Vaze

@Sagar_Vaze

Research Scientist @MistralAI | Prev @Oxford_VGG, @MetaAI

jamesaoldfield retweeted

Ben Hayum

@BenHayum

3 months ago

1/ AI agents are increasingly powerful. Security has not yet caught up. New from CNAS: our response to CAISI’s RFI on AI Agent Security, with @janet_e_egan and @CalebWithersDC. 🧵

2

22

8

2K

James Oldfield @jamesaoldfield

3 months ago

@johnhewtt Indeed!! Thanks for the writeup :)

0

1

0

129

jamesaoldfield retweeted

Shawn Im @shawnim00

4 months ago

Excited to share our recent work selected as an ICLR Oral!  We work towards answering how models learn to associate tokens and build semantic concepts. We find that early-stage features in attention-based models can be written as compositions of three basis features.

shawnim00's tweet photo. Excited to share our recent work selected as an ICLR Oral!

 We work towards answering how models learn to associate tokens and build semantic concepts. We find that early-stage features in attention-based models can be written as compositions of three basis features. https://t.co/Dr27yjuzf7

2

162

29

86

55K

James Oldfield @jamesaoldfield

4 months ago

Paper: https://t.co/HJvyxlecRE

0

1

0

83

James Oldfield @jamesaoldfield

4 months ago

Excited to share that our work on Dynamic Safety Monitoring for Language Models is accepted at ICLR 2026!! Looking forward to chatting with people there :) Thanks a lot to @philiptorr @ioannispatras @Adel_Bibi @FazlBarez !!

James Oldfield @jamesaoldfield

9 months ago

How can we efficiently monitor LLMs for safety? Strong monitors waste compute on easy inputs, but lightweight probes risk missing harms ⚠️ 𝙏𝙧𝙪𝙣𝙘𝙖𝙩𝙚𝙙 𝙥𝙤𝙡𝙮𝙣𝙤𝙢𝙞𝙖𝙡 𝙘𝙡𝙖𝙨𝙨𝙞𝙛𝙞𝙚𝙧𝙨 (TPCs) address this by generalizing linear probes for dynamic monitoring! 💫

2

40

11

18

10K

2

42

2

16

6K

James Oldfield @jamesaoldfield

4 months ago

@crabshellman @philiptorr @ioannispatras @Adel_Bibi @FazlBarez Thank you Jiaxing!

0

1

0

82

jamesaoldfield retweeted

Fazl Barez @FazlBarez

5 months ago

2 papers accepted at #ICLR 2026! Congrats to @elkmf & @SimonSchrodi and @jamesaoldfield for the hard work!

1

23

4

0

2K

jamesaoldfield retweeted

Arthur Conmy

@ArthurConmy

5 months ago

Our new @GoogleDeepMind paper studies novel activation probe architectures for classifying real-world misuse risks. Our research has informed live deployments of probes in Gemini. 🧵

ArthurConmy's tweet photo. Our new @GoogleDeepMind paper studies novel activation probe architectures for classifying real-world misuse risks.

Our research has informed live deployments of probes in Gemini. 🧵 https://t.co/VekT2JKZYG

16

723

59

396

138K

jamesaoldfield retweeted

Fazl Barez @FazlBarez

8 months ago

🚨New AI Safety Course @aims_oxford! I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at @UniofOxford what to expect 👇 https://t.co/r9YHS3XJhR

FazlBarez's tweet photo. 🚨New AI Safety Course @aims_oxford!

I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at @UniofOxford
what to expect 👇
https://t.co/r9YHS3XJhR https://t.co/95hTTXmRQD

6

112

23

52

15K

James Oldfield @jamesaoldfield

8 months ago

@yisongyue @Tsinghua_Uni @YueLabCaltech Amazing!! Well deserved @YueSong48287250 !!

0

1

0

175

jamesaoldfield retweeted

Tony Wu @TonyWu1105

8 months ago

🚨New paper! Excited to share my paper with @fazlbarez on Query Circuit Discovery! 𝙒𝙚 𝙪𝙣𝙘𝙤𝙫𝙚𝙧 𝙖 𝙨𝙥𝙖𝙧𝙨𝙚 𝙘𝙞𝙧𝙘𝙪𝙞𝙩 𝙞𝙣𝙨𝙞𝙙𝙚 𝙖𝙣 𝙇𝙇𝙈 𝙩𝙝𝙖𝙩 𝙖𝙘𝙩𝙪𝙖𝙡𝙡𝙮 𝙙𝙧𝙞𝙫𝙚𝙨 𝙞𝙩𝙨 𝙧𝙚𝙨𝙥𝙤𝙣𝙨𝙚 𝙩𝙤 𝙖 𝙪𝙨𝙚𝙧 𝙦𝙪𝙚𝙧𝙮.

TonyWu1105's tweet photo. 🚨New paper!
Excited to share my paper with @fazlbarez on Query Circuit Discovery! 𝙒𝙚 𝙪𝙣𝙘𝙤𝙫𝙚𝙧 𝙖 𝙨𝙥𝙖𝙧𝙨𝙚 𝙘𝙞𝙧𝙘𝙪𝙞𝙩 𝙞𝙣𝙨𝙞𝙙𝙚 𝙖𝙣 𝙇𝙇𝙈 𝙩𝙝𝙖𝙩 𝙖𝙘𝙩𝙪𝙖𝙡𝙡𝙮 𝙙𝙧𝙞𝙫𝙚𝙨 𝙞𝙩𝙨 𝙧𝙚𝙨𝙥𝙤𝙣𝙨𝙚 𝙩𝙤 𝙖 𝙪𝙨𝙚𝙧 𝙦𝙪𝙚𝙧𝙮. https://t.co/LPGTNqLU2S

1

15

7

2

1K

James Oldfield @jamesaoldfield

9 months ago

@dmkrash Hi Dima!! Thanks a lot for sharing both of these--looks like great work. We'll take a detailed read and update the pre-print to discuss :)

1

2

0

127

James Oldfield @jamesaoldfield

9 months ago

How can we efficiently monitor LLMs for safety? Strong monitors waste compute on easy inputs, but lightweight probes risk missing harms ⚠️ 𝙏𝙧𝙪𝙣𝙘𝙖𝙩𝙚𝙙 𝙥𝙤𝙡𝙮𝙣𝙤𝙢𝙞𝙖𝙡 𝙘𝙡𝙖𝙨𝙨𝙞𝙛𝙞𝙚𝙧𝙨 (TPCs) address this by generalizing linear probes for dynamic monitoring! 💫

2

40

11

18

10K

James Oldfield @jamesaoldfield

9 months ago

Please find many more results on 4 LLMs (across base models, instruction-tuned models, and reasoning models), and ablations in the paper! 📰 Project: https://t.co/urC7CPTZAO 💻 Code: https://t.co/1nBL6XXP3w 📄 Paper: https://t.co/HJvyxldF26

0

3

1

2

458

James Oldfield @jamesaoldfield

9 months ago

A big thank you to the fantastic coauthors! @philiptorr, @ioannispatras, @Adel_Bibi, @FazlBarez!

1

3

2

0

620

James Oldfield

@jamesaoldfield

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users