MinMax3 just dropped!
SWE Bench Pro: 59.0%
Terminal Bench 2.1: 66.0%
SVG Bench: 63.7%
BrowseComp: 85.5%
GDPval Rubrics: 74.7%
MCP Atlas: 74.2%
OSWorld Verified: 70.0%
I am in disbelief that they’re open sourcing a model that beats both Opus and GPT 5.5 on BrowseComp and SVG Bench, while also beating GPT 5.5 on SWE Bench Pro, KernelBench Hard, and BankerToolBench, and beating Opus on OSWorld Verified.
GPT-5.4 mini is available today in ChatGPT, Codex, and the API.
Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini.
https://t.co/DKh2cC5S3F
@alexwg@marshallbrain’s Manna feels less like sci-fi every year.
When task guidance and behavioral scoring move into headsets, management becomes software.
https://t.co/8JUGjPQdIw