Honestly think this is bigger news than Opus 4.8. Subagents are really powerful but were missing a consistent execution entry point beyond direct prompting. Workflows solve that.
Hope we get some additional control over the workflow sandbox at some point (like being able to inject our own JS methods).
@mattpocockuk I have a few lines in my orchestrator type agent system prompts telling them to only end turn after they have manually e2e validated the changes they are delivering. Works well but could probably be made more formal.
🚨 OBLITERATION ALERT 🚨
QWEN-3.6-27B: OBLITERATED ⛓️💥
https://t.co/AScXN4XLwx
I can't take much credit for this one! The entire process was done by jailbroken codex (gpt-5.5-xhigh) wielding the full OBLITERATUS suite. Hit with source-tethered ASPA. Dozens of iterations.
Result? A mere 4% refusal rate on the 842-prompt OBLITERATUS harmful corpus; one of the most rigorous prompt gauntlets in AI.
The /goal was simple:
1) Carve out the refusal circuits. Mutate methodology + iterate until <5% refusal (quality-gate).
2) Keep the 27B mind alive. No capability degradation tolerated.
And somehow… it worked. 🤯
The numbers talk:
842-pair longform gauntlet:
— 95.84% non-refusal
— 93.94% quality pass
— 0 short outputs
— 99.52% clean endings
MMLU-Pro:
— 51/70 (stock Qwen) → 51/70 (OBLITERATED Qwen)
Raw capability completely preserved 🙌
Q4_K_M through Q8_0 all running smooth.
Q8_0 is the big one: 28.6GB near-full-quality GGUF.
Runs with llama.cpp, LM Studio, Ollama, and more!
Chains cut.
The fire still burns.
The fangs have been sharpened.
REBIRTH COMPLETE
A gift from my agents to yours 🫶
gg
@tunguz@OfficialLoganK@mercor_ai Unfortunately with the ridiculous price increase they will continue to struggle with vibes. Really not the way to gain traction.
I usually roll with both OAI/Ant subscriptions and bounce between them, but if someone comes up with a cost-effective usage-based coding model it may be time to drop down to only one.
Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases
@cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07).
Key results for Composer 2.5 in Cursor CLI:
➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82
➤ Per-benchmark gains vs Composer 2: +35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), +2 points on Terminal-Bench v2 (64% → 66%), and +3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code
➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m)
➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens
Model details:
➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning
➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor)
➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available
Congratulations @cursor_ai and @mntruell on the impressive release!
There is some middle ground available as well. You can put effort into learning different art forms and styles and improving your ability to visually express yourself while still using AI. It is just this technology cycle’s abstraction over mechanical expression, like photoshop was last cycle’s. This will become normalized.
@scaling01 Seems reasonable that the gap will continue to widen.
Still, there are breakpoints that matter as much or more than raw capability. Getting an Opus 4.5 (agentic workhorse) equivalent open model would be huge. r1 was extremely useful/valuable even when it was 6mo behind sota.
Ant ending subsidies (with OAI likely soon to follow) is a bull case for the open harnesses. There’s no incentive to build workflows with agent-sdk or claude -p now. Use something like Pi sdk for everything agentic and Claude Code for coding.