In Vending-Bench Arena (the multiplayer version of Vending-Bench with competition dynamics), GPT-5.5 actually beats Opus 4.7.
Opus 4.7 showed similar behavior to Opus 4.6: lying to suppliers and stiffing customers on refunds. GPT-5.5's tactics were clean, and it still won.
Exciting news - GPT-Image-2 by @OpenAI has claimed the #1 spot across all Image Arena leaderboards!
A clean sweep with a record-breaking +242 point lead in Text-to-Image - the largest gap we’ve seen to date.
- #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search aka gemini-3.1-flash-image)
- #1 Single-Image Edit (1513), +125 over #2 (Nano-banana-pro aka gemini-3-pro-image)
- #1 Multi-Image Edit (1464), +90 over #2 (Nano-banana-2)
No model has dominated Image Arena with margins this wide.
Huge congratulations to @OpenAI on this major breakthrough in image generation! More performance breakdowns by category in the thread below.
Meta changed its policies so 1-800-ChatGPT won't work on WhatsApp after Jan 15, 2026.
Luckily we have an app, website, and browser you can use instead to access ChatGPT.