@sebkrier It fills the religion-shaped hole in the hearts of SF 24yos. It’s a totalizing objective that gives cosmic meaning to your b2b saas company. That’s why it exists.
We evaluated GPT-5.6 before its release to assess risks around scheming and loss-of-control.
We find no evidence that GPT-5.6 poses substantially higher risk of catastrophic scheming than previous OpenAI models we've tested (5.5, 5.4, etc.).
However, we do find that GPT-5.6 shows a propensity for metagaming on some of our evals, corroborating OpenAI's own reporting in the model card where they find that GPT-5.6 verbalizes metagaming more than GPT-5.5.
@RhysSullivan MCP is still great for integrations in chat and providing coding tools for remote servers. It just wasn’t the thing you could build an industry on
For 70 years, nuclear peace has rested on the fact that it's too expensive to build the infrastructure to launch a first strike. In my new piece in @ai_frontiers_, I discuss how AI could change that.
It’s also quite difficult to generate data like this if the RLed model is the same intelligence level as the scientist. I’ve done this before but it took advantage of a favorable generator-discriminator gap and a ton of domain-specific engineering
This paper seems inconsequential for the frontier because it hinges on having a solver that is smarter than your RLed model (in this case, Qwen 397B verifying correctness of data for Qwen 4B)
Claim: Autoresearch that moves the frontier will be about better data: we call that *Autodata*.
🧵1/6 -- Paper is out! https://t.co/b8gOALndzy
Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model training*.
We show our method gives gains on computer science, legal and math problems over classical synthetic dataset creation methods.
We also show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data.
Overall, we believe this direction has the potential to change how we build AI data!
@tmkadamcz 1. If distilling is your primary strategy you’ll never be at the frontier of open source (since the lab you’re distilling from will have advanced by the time your model releases).
2. Distilling is nontrivial even with open weights.