Before limited-releasing Claude Mythos Preview, we investigated its internal mechanisms with interpretability techniques. We found it exhibited notably sophisticated (and often unspoken) strategic thinking and situational awareness, at times in service of unwanted actions. (1/14)
Our new @GoogleDeepMind paper studies novel activation probe architectures for classifying real-world misuse risks.
Our research has informed live deployments of probes in Gemini. 🧵
> introducing eval!
> eval has llm as a judge
> ok but how accurate is the judge
> introducing judge eval!
> judge eval has llm as a judge
…
With the Claude Code shutdown. I am proud that we build Codex in the open with our OSS repo and we are 100% invested in supporting a flourishing ecosystem of agentic coding tools out there.
You can already build on top of https://t.co/hDjdmjH8pg directly, which includes ChatGPT login and same usage as you get in codex. Reach out if you are a builder!
So let me get this straight
-Microsoft upsets people yesterday with the OAI for-profit conversion
-Amazon upsets people yesterday with the layoffs
-Today, AWS and Azure are down
Am I reading too much into this?
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?
Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!
https://t.co/1t2fsW7jyL🧵