We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top
DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task.
The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others.
More below.
@sama Can you confirm or deny whether this model contamination from gpt-5.3-codex is fixed in gpt-5.4*? We had to abandon it because of this: https://t.co/gBXf4R9d7s
@dwarkesh_sp@RichardSSutton I was very disappointed with how many times he said the word "obvious". If you cannot explain your position without that tactic, it's a huge red flag.
@alexalbert__ Am I seeing right that Claude Code and its Typescript SDK are not open source? It looks like their Python SDK is MIT-licensed at least, though not being able to plug in arbitrary models/providers is a bummer.
Huge repository of information about OpenAI and Altman just dropped — 'The OpenAI Files'.
There's so much crazy shit in there. Here's what Claude highlighted to me:
1. Altman listed himself as Y Combinator chairman in SEC filings for years — a total fabrication (?!):
"To smooth his exit [from YC], Altman proposed he move from president to chairman. He pre-emptively published a blog post on the firm's website announcing the change.
But the firm's partnership had never agreed, and the announcement was later scrubbed from the post."
"...Despite the retraction, Altman continued falsely listing himself as chairman in SEC filings for years, despite never actually holding the position."
(WTAF.)
2. OpenAI's profit cap was quietly changed to increase 20% annually — at that rate it would exceed $100 trillion in 40 years. The change was not disclosed and OpenAI continued to take credit for its capped-profit structure without acknowledging the modification.
3. Despite claiming to Congress he has "no equity in OpenAI," Altman held indirect stakes through Sequoia and Y Combinator funds.
4. Altman owns 7.5% of Reddit — when Reddit announced its OpenAI partnership, Altman's net worth jumped $50 million. Altman invested in Rain AI, then OpenAI signed a letter of intent to buy $51 million of chips from them.
5. Rumours suggest Altman may receive a 7% stake worth ~$20 billion in the restructured company.
5. OpenAI had a major security breach in 2023 where a hacker stole AI technology details but didn't report it for over a year. OpenAI fired Leopold Aschenbrenner explicitly because he shared security concerns with the board.
6. Altman denied knowing about equity clawback provisions that threatened departing employees' millions in vested equity if the ever criticised OpenAI. But Vox found he personally signed the documents authorizing them in April 2023. These restrictive NDAs even prohibited employees from acknowledging their existence.
7. Senior employees at Altman's first startup Loopt twice tried to get the board to fire him for "deceptive and chaotic behavior".
9. OpenAI's leading researcher Ilya Sutskever told the board: "I don't think Sam is the guy who should have the finger on the button for AGI".
Sutskever provided the board a self-destructing PDF with Slack screenshots documenting "dozens of examples of lying or other toxic behavior.
10. Mira Murati (CTO) said: "I don't feel comfortable about Sam leading us to AGI"
11. The Amodei siblings described Altman's management tactics as "gaslighting" and "psychological abuse".
12. At least 5 other OpenAI executives gave the board similar negative feedback about Altman.
13. Altman owned the OpenAI Startup Fund personally but didn't disclose this to the board for years. Altman demanded to be informed whenever board members spoke to employees, limiting oversight.
14. Altman told board members that other board members wanted someone removed when it was "absolutely false". An independent review after Altman's firing found "many instances" of him "saying different things to different people"
15. OpenAI required employees to waive their federal right to whistleblower compensation. Former employees filed SEC complaints alleging OpenAI illegally prevented them from reporting to regulators.
16. While publicly supporting AI regulation, OpenAI simultaneously lobbied to weaken the EU AI Act.
By 2025, Altman completely reversed his stance, calling the government approval he once advocated "disastrous" and OpenAI now supports federal preemption of all state AI safety laws even before any federal regulation exists.
Obviously this is only a fraction of what's in the apparently 10,000 words on the site. Link below if you'd like to look over.
(I've skipped over the issues with OpenAI's restructure which I've written about before already, but in a way that's really the bigger issue.)
Already missing these vintage NLG sheets from back in the day when the Halcons and 49STA.FE were unstoppable. @ Texcoco, Mexico https://t.co/c7Jwd9wgYw