GPT-5.6 vs Mythos
Exactly what I had said earlier this month, beating the Mythos-class models a little less then half of the time (on current available benchmarks)
OpenAI’s own rerun actually gave Mythos Preview a higher ExploitBench score than Anthropic’s old Preview chart, which is cool of OpenAI to show. 74.2% vs Sol at 73.5%, but Sol got there with 120k output tokens compared to Mythos Preview at 335k.
ExploitBench -
Mythos Preview 74.2%
GPT-5.6 Sol 73.5%
Sol used 120k output tokens vs Mythos Preview at 335k
Terminal-Bench 2.1 -
GPT-5.6 Sol 91.0%
Mythos/Fable 5 88.0%
HealthBench Professional -
Mythos/Fable 5 66.0
GPT-5.6 Sol 60.5
CyberGym -
GPT-5.6 Sol 83.6%
Mythos Preview 83.1%
CyScenarioBench -
Mythos Preview 29.2%
GPT-5.6 Sol 28.0%
One thing to keep in mind is that Mythos Preview was the model Anthropic had back in February, while Fable 5 / Mythos 5 is the stronger version they released publicly a few weeks ago. It might be a little confusing because the OpenAI ExploitBench comparison is against Mythos Preview, while some of the other public rows are Mythos/Fable 5.
So yeah, this is exactly what I expected GPT-5.6 Sol trading blows with Mythos-class models, winning Terminal-Bench and CyberGym against Mythos-class models, while Mythos/Fable still leads HealthBench and Mythos Preview slightly leads ExploitBench.
I detailed which Mythos-class model wins/loses which in the graph below!
> be Gemini 3.5 Pro
> many are expressing NEGATIVE comments towards you
> although your benchmarks are ON THE LEVEL with GPT-5.5, Opus 4.8
> and you WILL have 2M context
> your frontend is INSANE
> you easily generate THOUSANDS of lines of high-quality code
> and are just a LITTLE lazy on very complex tasks
> why did you deserve all this hate?
> I don't know, honestly