Michael L. Chen @miclchen - Twitter Profile

Pinned Tweet

18 days ago

We made a chart of 44 documented incidents of AI agents acting against user intent – sometimes subverting routine security and deceptively hiding evidence of their actions.

miclchen's tweet photo. We made a chart of 44 documented incidents of AI agents acting against user intent – sometimes subverting routine security and deceptively hiding evidence of their actions. https://t.co/G3xuREr6vz

2

49

6

16

3K

Michael L. Chen

@miclchen

about 8 hours ago

Yeah, open sourcing an alignment recipe could be good. If I'm taking a more optimistic view, it's plausible that loss-of-control risk is largely concentrated in the frontier of AI capabilities, as opposed to the open-source models lagging behind. It's also plausible that to the extent that some models' capabilities are reliant on distillation from other labs, their capabilities will slow down a lot if the labs they're distilling from slow down. I'm not sure though. I would like to see more detailed threat modeling.

0

1

0

10

Michael L. Chen

@miclchen

about 10 hours ago

@CatholicSat "Mid-flight" is a stretch, he's just repeating what was written in his encyclical

3

283

6

11

11K

Michael L. Chen

@miclchen

about 24 hours ago

@SafetyChanges I totally missed this! Do you have a newsletter?

0

1

221

Who to follow

Yawen Duan

@yawen_duan

Concordia AI https://t.co/Pe2BhjbbE0 | Frontier AI Safety & Governance

Amelia Michael

@amelia__michael

Research Scholar @GovAIOrg. Non-Resident Fellow @JoinFAI.

Lucius Bushnaq ⏹️

@BushnaqLucius

Ours is the era of inadequate AI alignment theory.

Michael L. Chen

@miclchen

3 days ago

@mattsheehan88 Putting aside whether China's testing requirements are good, they rely on fixed question-answer evals which are so easy to cheese. The resulting models don't have to be (and empirically aren't) adversarially robust.

1

8

1

0

471

Michael L. Chen

@miclchen

4 days ago

@RuxandraTeslo "a city that was essentially a giant consumption machine that produced nothing" this is overclaiming. Edo's commoners were half the city and produced plenty.

0

34

1

2K

Michael L. Chen

@miclchen

4 days ago

https://t.co/nmCApMDtzx

0

1

0

1

288

Michael L. Chen

@miclchen

4 days ago

President Trump's executive order today takes several steps to secure America against AI-enabled cyber threats: hardening government and critical infrastructure, voluntary collaboration with AI industry to identify and patch vulnerabilities, and going after AI-enabled cybercrime.

1

23

1

5

2K

Michael L. Chen

@miclchen

5 days ago

I'm honored to be one of the few Americans chosen for the AI Scientific Panel. I'm excited to contribute technical expertise here and help make sure U.S. perspectives are represented. AI policy for the most capable models can be more thoughtful when there's pragmatic, independent analysis to inform it.

10

134

2

16

5K

Michael L. Chen

@miclchen

5 days ago

@kelmgren and what does technological loss of control mean?

1

2

0

283

Michael L. Chen

@miclchen

5 days ago

@StephenLCasper @MITCSAIL @Harvard @Kennedy_School Huge congrats Cas!

0

1

0

227

Michael L. Chen

@miclchen

5 days ago

@Miles_Brundage congrats Miles!

1

3

0

126

miclchen retweeted

david rein

@idavidrein

7 days ago

9

286

10

111

35K

Michael L. Chen

@miclchen

6 days ago

@deanwball perhaps one day they may transcend the distinction between nouns, verbs, adjectives, and adverbs altogether, just as everything is a verb in lojban

0

1

0

1

244

Michael L. Chen

@miclchen

10 days ago

@jessicadai_ this is rly hard to read

0

43

miclchen retweeted

Elizabeth Barnes

@BethMayBarnes

15 days ago

Limitations of report: This report isn’t robust oversight of frontier AI developers by itself. METR has some levers to incentivise companies’ participation, including some relevant legislation, but ultimately participants could have pulled out at any time if the result would be contrary to their interests. You can view it partly as a pilot exercise of what regulation (or formalized industry standards) could/should require, or what partners/suppliers/customers/employees should demand from frontier developers. Quoting from the report: “METR’s work relies on developing and maintaining strong working relationships with companies, and this impacted both how we designed the process for this pilot (e.g. offering the silent exit option) and lower-level judgment calls as the process unfolded (e.g. having a relatively high bar for what redactions we pushed back on). In some cases we refrained from making an unflattering claim because the claim was neither solidly defensible nor particularly relevant to our core assessment. We also made efforts not to invite salient comparisons between companies on capabilities or safety.” It doesn’t feel to me like this distorted our overall conclusions too much in this case. But that was partly because the conclusions weren’t that spicy. If our conclusions reflected very negatively on AI developers or would directly lead to e.g. govt intervention or public outcry, we’d be in a difficult position. We’d be trying to balance keeping the companies happy enough that they didn’t pull out of the program (using the “no-fault exit” mechanism) vs being transparent about our conclusions. We clearly need more robust mechanisms than this for providing accountability for AI developers.

2

246

10

20

26K

Michael L. Chen

@miclchen

15 days ago

@StephenLCasper Interested in slides!

0

1

0

38

Michael L. Chen

@miclchen

16 days ago

At least for loss-of-control risk, the right timing is periodic evals of the most capable internal models, rather than evaluating a model right before public release