Megan Kinniment @mkinniment - Twitter Profile

Pinned Tweet

about 1 year ago

Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:

METR @METR_Evals

about 1 year ago

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

METR_Evals's tweet photo. When will AI systems be able to carry out long projects independently?

In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months. https://t.co/KuZrClmjcc

166

5K

893

2K

9M

5

123

10

41

28K

Megan Kinniment

@MKinniment

12 days ago

@AskYatharth I did! And thanks :)

0

1

0

13

Megan Kinniment

@MKinniment

20 days ago

I worked on this. Can confirm, the models were often quite misleading! Some examples and thoughts:

METR @METR_Evals

20 days ago

Fact 3: When the agents were faced with hard tasks, they routinely violated constraints and acted deceptively. We’ve seen this pattern across our own coding and research evaluations, and developers reported they’ve also seen agents behave this way.

METR_Evals's tweet photo. Fact 3: When the agents were faced with hard tasks, they routinely violated constraints and acted deceptively. We’ve seen this pattern across our own coding and research evaluations, and developers reported they’ve also seen agents behave this way. https://t.co/IA69ZHeCDV

5

224

31

55

80K

2

114

4

45

12K

Megan Kinniment

@MKinniment

20 days ago

Also don't forget to read the risk report! :)

0

16

0

902

Megan Kinniment

@MKinniment

20 days ago

I'd like for us to be able to measure capabilities and this kind of sloppiness better. If you're interested in that, consider applying to METR!

1

25

0

1K

Megan Kinniment

@MKinniment

about 2 months ago

You can just turn memory off

1

9

0

1

886

Megan Kinniment

@MKinniment

2 months ago

2023 was also the scariest time for me (so far) 2023 felt like we were flying blind. Then 2024-2025 we got better evals + trends, and we could finally see in front of us. Now I think capabilities are starting to outpace our sight again. I hope we don’t end up back in 2023!

roon

@tszzl

2 months ago

tbh I only feel more accelerationist as the capabilities ramp … the scaredest I was was in early 2023

95

1K

36

122

110K

4

48

0

4

4K

Megan Kinniment

@MKinniment

2 months ago

The human brain has such a rough task, so much prediction that involves itself! Low dim representations of the self seem helpful. Maybe emotions might serve as one of them.

1

23

0

1

1K

Megan Kinniment

@MKinniment

2 months ago

I wonder what would happen if we let the models apply steering vectors to themselves?

Anthropic

@AnthropicAI

2 months ago

For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment.

AnthropicAI's tweet photo. For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment. https://t.co/sKPiB6TrcY

69

3K

244

1K

845K

44

536

18

190

57K

Megan Kinniment

@MKinniment

2 months ago

In some ways, ‘self-applying steering vectors’ feels similar to how humans exercise control over their emotional state.

2

33

0

2

1K

Megan Kinniment

@MKinniment

2 months ago

I think open sourcing the full set of human scores for the public set would help with the ‘ambiguous tasks’ worry I have. (Since then people could do things like IRT to check for weird looking tasks that might benefit from an update).

0

117

Megan Kinniment

@MKinniment

2 months ago

FWIW this seems reasonable to me, given that: - the solutions aren’t luck based or ambiguous (e.g. 2 valid solutions but only one counts as correct) - humans and models have access to the same information and affordances (in so far as that’s possible)

François Chollet

@fchollet

2 months ago

To be clear, all ARC-AGI-3 environments are feasible by humans with no prior ARC-AGI-3-specific training. Our bar for feasibility is the following... Each environment was seen by 10 human testers. If 2 testers could independently clear it (successfully solving *all* levels in the environment), the environment was deemed feasible. Most environments were cleared by 5+ testers. Who are these testers? We hired ~500 people to show up at our testing center, with no required qualifications and no ability-based screening, with a ~$115-140 incentive. About 25% were unemployed and another 20% were part-time workers (which is about what you'd expect in this setting).

36

544

29

77

105K

1

3

0

404

Megan Kinniment

@MKinniment

2 months ago

(Though atm I have various worries about implementation e.g. ambiguous tasks, unfairness from overly loading on human prior knowledge of conventions in 2d grid based games).

1

0

134

Megan Kinniment

@MKinniment

Last Seen Users on Sotwe

Trends for you

Most Popular Users