Agents are getting better at figuring things out for themselves.
The /loop skill uses the Cursor harness to encourage the agent to continue iterating until the task is complete.
Try "/loop until this PR merges"
Did you know Cursor can watch output from terminals and take action? It's very extensible.
I used it to make a /loop skill, which wakes the agent up on a schedule. Try "/loop until this PR merges" or "/loop 1h check #infra-logs for anything critical".
Should I do /goal next?
Here's a step by step:
1. Open the in-app browser to select, scribble, and describe changes to any component in design mode.
2. Type /multitask or queue changes to run them in parallel — great for touching several unrelated components at once.
3. Get your steps in, grab a coffee, text mom to just say hi.
Composer 2.5 is now available inside Grok Build.
Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.
composer 2.5 is, as of today, the best way to visually iterate on frontends.
i was frankly surprised by this, but the speed/quality/value ratio of the model and harness is insanely good. cursor is back!
Enterprise Security teams can be... risk averse.
Very excited for Auto-review to help unblock end users and give Security teams peace of mind as agents take on more ambitious tasks.
introducing thermos in cursor
a deep security/correctness audit and a harsh code quality audit, run in parallel on your branch, synthesized into one prioritized list
How are coding agents changing software engineering?
Yapped for 15 minutes about new Cursor data we published, including:
1. Why lines of code is an imperfect measure of AI progress
2. Balancing intelligence/cost/speed for models
3. Code reviews with "Mega PRs" (1000+ lines)
Today's Training Data episode takes us BTS on the infrastructure challenges required to do large RL runs at scale, featuring @ellev3n11 (Composer Lead at @cursor_ai) and @dzhulgakov (Co-Founder at @FireworksAI_HQ).
The Cursor team trained Composer 2 on Fireworks by starting with a strong base model (Kimi 2.5) and performing large-scale mid-training on code tokens and web data to learn common patterns and libraries, followed by a large-scale Reinforcement Learning run to learn how to navigate the Cursor harness, call tools, and write correct code.
Today's episode dives into the systems and infrastructure challenges of making that large RL run happening, and there were many (!!), from numerical mismatch to global distribution to synchronizing rollouts across asynchronous pipelines to keeping track of expert activation across runs and more.
Extremely nerdy in-the-weeds challenges that Federico and Dima were delighted to nerd out on together :)
Beyond RL infra, we also discussed Online vs Simulated rollouts, self-summarization for long-horizon agents, environment design ("the most powerful RL environment is the product itself"), and other technical nuggets.
PS: We filmed this episode before the SpaceX news, while the Cursor team was still compute-constrained. While Cursor now has *all* the flops, the takeaways and hurdles crossed ring true for any serious application-level company that is racing to post-train their own models.
I believe that more serious application companies will go the way of Cursor and post-train their own models.
00:00 Introduction
00:53 Why Cursor Trained Composer 2
04:55 Specialization vs Bitter Lesson
06:16 Composer 2 Training Recipe
16:32 Scaling RL Infrastructure Globally
23:32 Floating Point Drift
25:11 MoE Sensitivity Explained
26:25 Router Replay Fix
27:19 Real Time RL Loop
31:49 Long Horizon Agents
34:29 Why RL Everywhere
37:34 LLM as Judge Rewards
39:14 RL in Hard Domains
40:13 Build Your Own Environments
44:34 Closing Thoughts
Every enterprise I speak with is trying to go poteto-mode (they often mistakenly call it something else).
Lauren has a great playbook that every team will benefit from. It’s been very cool to see pstack gain traction internally at Cursor.
When we first launched the TypeScript SDK, the first question from a few customers was "when is the Python SDK going to be available?"
The answer was "in about 2 weeks."
Excited for these teams to build on top of the SDK!
With the Cursor SDK, you can build your own agents with Composer 2.5. It's now available in Python and TypeScript.
This long weekend, Composer usage is 90% off in the SDK. We're excited to see what you build!