See you at Cursor Compile next Tuesday in SF!
I'll be talking with @levelsio about retro computing, building your ideas, lifting, the perfect steak, and more.
Recently met @srush_nlp and he started giving me an impromptu lecture on how targeted on-policy self-distillation works.
I asked him if I could record it on my iPhone.
The basic idea is this: if the model made a mistake at some point in the rollout (for example, calling a tool that doesn't exist), we want to discourage this specific error, but we don't want to just learn from the final reward, because it's a very noisy signal spread out over the whole trajectory.
So we have another model read this trajectory and figure where the error was made. It simply inserts some hint tokens to the part of the trajectory right above where the mistake was made.
Now with these injected hint tokens, have the model run a forward pass. You're not having to regenerate a new rollout - aka no new decode required.
The hint causes the model to assign lower probabilities to the error tokens. You then trains the original model to match these new probabilities, teaching it to downweight that specific mistake.
Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases
@cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07).
Key results for Composer 2.5 in Cursor CLI:
➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82
➤ Per-benchmark gains vs Composer 2: +35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), +2 points on Terminal-Bench v2 (64% → 66%), and +3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code
➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m)
➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens
Model details:
➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning
➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor)
➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available
Congratulations @cursor_ai and @mntruell on the impressive release!
Very impressed with Composer 2.5 after about a day of usage. I've almost moved over to it exclusively from GPT 5.5, even using it for planning now.
It's like Opus 4.7 on steroids, crazy fast. Fast models really get me into the flow of building, which is exhilarating.
Also, a sneak peek of a weekend mini-project I'm building: a racing game where you race your mouse cursor.
First poster for ‘EAST OF EDEN’, starring Florence Pugh.
The series follows the intertwined destinies of the Trask and Hamilton families in California's Salinas Valley.
Releasing this Fall on Netflix.
cursor sdk launched yesterday!
people are already putting cursor agents in places they already work: gmail, chrome, ci, terminal, docs github issues
here are 11 projects built in the first day ↓
Great first week on Cursor CLI with @luis18
We shipped a lot
Rough edges sanded down
And here's a thread with a few new features we added
Lots to still do, feedback always welcome
Glass / Agents Window is the big story of Cursor 3 but there's lots of improvements we made to the standard IDE, its not going anywhere.
My favorite is using standard vscode tabs for our chats so you can easily pull them out side by side and use all the standard VSCode shortcuts to manage them. It makes managing parallel agents a lot easier.
The official trailer for Olivia Wilde’s ‘THE INVITE’ has been released.
Starring Seth Rogen, Olivia Wilde, Penélope Cruz and Edward Norton.
In theaters this July.