Every time DeepSWE comes up, agentic-coding users seem to ask the same thing:
Where is Composer 2.5?
DeepSWE has GPT-5.5, Opus, Sonnet, Kimi, Gemini, DeepSeek, Qwen, etc. But Composer 2.5, one of the main models people actually use inside Cursor, has no official DeepSWE row yet.
So I tried a benchmark-linking estimate.
CursorBench 3.1 has Composer 2.5. DeepSWE does not. But both share several other model-effort configurations.
So the question becomes: if we use those shared rows as a bridge, where would Composer 2.5 roughly land on DeepSWE?
I recomputed DeepSWE Pass@1 from trial-level data, normalized model names and reasoning-effort labels, then matched overlapping model-effort pairs between DeepSWE and CursorBench 3.1.
Then I estimated Composer 2.5 using several linking checks: ordinary least squares regression, ridge-style regression, Theil-Sen robust regression, linear equating, equipercentile equating, nearest-neighbor imputation, bootstrap plus leave-one-out sensitivity, and a median-delta baseline.
This is not an official DeepSWE result. It is an estimate from overlapping model-effort pairs, meant to ask whether Composerโs CursorBench performance could transfer to DeepSWE-style long-horizon software-engineering tasks.