Dylan Bowman @dylanbowmanSF - Twitter Profile

@sebkrier It fills the religion-shaped hole in the hearts of SF 24yos. It’s a totalizing objective that gives cosmic meaning to your b2b saas company. That’s why it exists.

0

4

0

1

82

Who to follow

Nina

@NinaPanickssery

there’s a grain of joke in every joke

Ray Amjad

@theramjad

Making high-signal AI videos https://t.co/XrvfUwXjBt ⚛️ Prev Physics @Cambridge_Uni

Jay📖

@jayluxeed

incentive architect | not a physician | exploring ideaspace | https://t.co/xZ8pWwIow7

Dylan Bowman

@dylanbowmanSF

about 21 hours ago

wow what an eloquent summary of our pre-deployment evals for gpt 5.6. written quite handsomely and wasianly as well

Apollo Research

@apolloaievals

about 21 hours ago

We evaluated GPT-5.6 before its release to assess risks around scheming and loss-of-control. We find no evidence that GPT-5.6 poses substantially higher risk of catastrophic scheming than previous OpenAI models we've tested (5.5, 5.4, etc.). However, we do find that GPT-5.6 shows a propensity for metagaming on some of our evals, corroborating OpenAI's own reporting in the model card where they find that GPT-5.6 verbalizes metagaming more than GPT-5.5.

apolloaievals's tweet photo. We evaluated GPT-5.6 before its release to assess risks around scheming and loss-of-control.

We find no evidence that GPT-5.6 poses substantially higher risk of catastrophic scheming than previous OpenAI models we've tested (5.5, 5.4, etc.).

However, we do find that GPT-5.6 shows a propensity for metagaming on some of our evals, corroborating OpenAI's own reporting in the model card where they find that GPT-5.6 verbalizes metagaming more than GPT-5.5.

14

337

25

52

28K

3

30

0

5

4K

Dylan Bowman

@dylanbowmanSF

about 22 hours ago

@RhysSullivan MCP is still great for integrations in chat and providing coding tools for remote servers. It just wasn’t the thing you could build an industry on

0

1

0

66

Dylan Bowman

@dylanbowmanSF

1 day ago

@eliebakouch @lilianweng Yeah this is the majority of the difference between the two studies

0

1

0

49

Dylan Bowman

@dylanbowmanSF

1 day ago

@forethought_org @Benthamsbulldog Moral realism will have a greater death toll than nazism or communism

1

3

0

146

dylanbowmanSF retweeted

Govind Pimpale

@GovindPimpale

1 day ago

For 70 years, nuclear peace has rested on the fact that it's too expensive to build the infrastructure to launch a first strike. In my new piece in @ai_frontiers_, I discuss how AI could change that.

GovindPimpale's tweet photo. For 70 years, nuclear peace has rested on the fact that it's too expensive to build the infrastructure to launch a first strike. In my new piece in @ai_frontiers_, I discuss how AI could change that. https://t.co/LJadARyemG

2

11

4

0

151

Dylan Bowman

@dylanbowmanSF

2 days ago

@lillian_ma_ @sooyoon_eth @METR_Evals @BethMayBarnes @arcprize @fchollet @mikeknoop Definitely a bot btw

0

7

Dylan Bowman

@dylanbowmanSF

2 days ago

It’s also quite difficult to generate data like this if the RLed model is the same intelligence level as the scientist. I’ve done this before but it took advantage of a favorable generator-discriminator gap and a ton of domain-specific engineering

0

3

0

83

Dylan Bowman

@dylanbowmanSF

2 days ago

This paper seems inconsequential for the frontier because it hinges on having a solver that is smarter than your RLed model (in this case, Qwen 397B verifying correctness of data for Qwen 4B)

Jason Weston

@jaseweston

2 days ago

Claim: Autoresearch that moves the frontier will be about better data: we call that *Autodata*. 🧵1/6 -- Paper is out! https://t.co/b8gOALndzy Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model training*. We show our method gives gains on computer science, legal and math problems over classical synthetic dataset creation methods. We also show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data. Overall, we believe this direction has the potential to change how we build AI data!

jaseweston's tweet photo. Claim: Autoresearch that moves the frontier will be about better data: we call that *Autodata*.

🧵1/6 -- Paper is out! https://t.co/b8gOALndzy

Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model training*.

We show our method gives gains on computer science, legal and math problems over classical synthetic dataset creation methods.

We also show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data.

Overall, we believe this direction has the potential to change how we build AI data!

1

825

116

851

57K

3

17

0

7

2K

Dylan Bowman

@dylanbowmanSF

2 days ago

@tmkadamcz 1. If distilling is your primary strategy you’ll never be at the frontier of open source (since the lab you’re distilling from will have advanced by the time your model releases). 2. Distilling is nontrivial even with open weights.

0

3

0

64