Jacob Hilton @JacobHHilton - Twitter Profile

Jacob Hilton @JacobHHilton

about 5 hours ago

For more discussion, see our announcement post: https://t.co/syaoMJGS6p

0

75

Jacob Hilton @JacobHHilton

about 5 hours ago

ARC and @aicrowdHQ are launching a ≥$100k contest for white-box estimation algorithms: given the weights of an MLP, the goal is to estimate the expected output of the network on Gaussian inputs. (Thread)

JacobHHilton's tweet photo. ARC and @aicrowdHQ are launching a ≥$100k contest for white-box estimation algorithms: given the weights of an MLP, the goal is to estimate the expected output of the network on Gaussian inputs. (Thread) https://t.co/9l5rK870JB

1

13

5

2K

Jacob Hilton @JacobHHilton

about 5 hours ago

It's rare to find good, easily-measurable metrics for progress in alignment. But we are cautiously optimistic that top submissions will produce ideas that meaningfully advance our research.

1

0

83

JacobHHilton retweeted

METR @METR_Evals

16 days ago

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

METR_Evals's tweet photo. Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control.

The result: our first Frontier Risk Report. https://t.co/sUpiHgCrTM

31

896

194

544

338K

Who to follow

Rachel Freedman (will be @ICML2026)

@FreedmanRach

RLHF, LLMS, interpretability & safety | PhD researcher @berkeley_ai | Previously @Cambridge_Uni and @DukeU

Jan Leike

@janleike

AI research @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

Yi Tay

@YiTayML

research scientist @googledeepmind ✨♊, model co-lead/captain of gemini deepthink imo gold medal 🥇, opinions are my own.

JacobHHilton retweeted

Steven Adler

@sjgadler

16 days ago

Some personal news: I've started a new AI safety standards org, and our first two standards are out today. We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

sjgadler's tweet photo. Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n) https://t.co/evWWjkIkJT

24

530

62

163

58K

Jacob Hilton @JacobHHilton

27 days ago

@_Suresh2 The method works for estimating variance too! But yes, perhaps we should add a plot for that too.

0

32

Jacob Hilton @JacobHHilton

28 days ago

Can you estimate the average behavior of a neural network without running it? In ARC's latest paper, we address this question for wide randomly-initialized MLPs with Gaussian inputs. (Thread)

4

70

6

43

7K

Jacob Hilton @JacobHHilton

28 days ago

@ABAtanasov Thank you! We are working on low width too, but it's harder :)

0

2

0

124

Jacob Hilton @JacobHHilton

28 days ago

Congrats to first author Wilson Wu, as well as my other coauthors @vclecomte, Mike Winer, George Robinson and @paulfchristiano. Paper: https://t.co/qs6q3MWjPD Blog post: https://t.co/kowNfZbr4Y

0

14

1

5

1K

Jacob Hilton @JacobHHilton

28 days ago

Our current approach works at the start of training, but we have a lot more work to do to produce methods that work throughout training (even for small models like the AlgZoo models we shared a few months ago).

1

7

1

0

576

JacobHHilton retweeted

Dwarkesh Patel

@dwarkesh_sp

4 months ago

Seems like a great opportunity for technical talent to come into government and help the USG make sound, technically informed decisions on AI

9

144

15

40

50K

Jacob Hilton @JacobHHilton

4 months ago

@bzogrammer @davidad Although note that we only ever unroll the RNN for 10 steps (and there is a fresh input changes at each step), so no fixed point is ever actually reached!

0

2

0

967

Jacob Hilton @JacobHHilton

4 months ago

A challenge to the mechanistic interpretability community: fully interpret our 432-parameter RNN. (Thread)

15

550

36

466

64K

Jacob Hilton @JacobHHilton

4 months ago

@mrsirrisrm Good work so far! This roughly matches our understanding of neurons 1, 2, 4, 6 and 7 as explained in the post.

1

0

76

Jacob Hilton

@JacobHHilton

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users