This is in reference to Claude Code… using Opus 4.5.
But I’m seeing the same with all of @AnthropicAI’s 4.5 models with @cursor_ai.
Monitoring the thinking output I see a breakdown when the context window is almost full; preferring to save tokens at the expense of doing the task
It started lying constantly "Has been implemented!" / "FOUND THE BUG!", but it
- Implemented workarounds
- Simplified tests
- REMOVED TESTS
- Refused to write tests
- Ran only a subset of tests
- Removed features to make others work
- Regressions framed as "preexisting"
The most overpowered prompt in AI...
At the end of a conversation, ask:
Based on what we have just done, how could we have worked better together? What information should I have shared earlier? What critical details were unclear to you for too long?
#alwaysImproving
@ZackKorman Yes. I see it.
I seem to spend most of my time constraining AI brain-farts… because the agent suddenly has a “great idea” (or fractionally different context).
Sounds to me like a need for:
1) More orchestration…
2) Less AI…
I always start as small as possible, then expand the scope of an AI micro-agent until the eval %s start to drop off.
Start too big and the uncertainties cannot be tamed in retrospect.
Yesterday I wrote a thread about AI threat detection, and today I woke up to ten different alerts where Gemini said “I’ve decided this isn’t a threat, so I’ll return false” and then returned true anyway. So maybe y’all are right AI sucks
I’ve been through 2 deep optimization cycles.
And reduced my token usage by ~40%.
Currently, I only with @cursor_ai 8-10 days per month…
because the ultra package doesn’t include enough tokens.
Agentic coding is not constrained by the models…
It’s constrained by energy.
#jevons
I am not a competent coder.
I am a product guy…
But @cursor_ai + @claudeai make my ideas real.
I have standards - code quality & test coverage.
The cost?
50-125 million tokens per day.
Roughly 1-2 billion per month.
@ericzakariasson@cursor_ai A prompt improvement optimizer…
Plugin: “you’re doing X a lot… do you want me to create a new rule so I will do it automatically in future?”
Plugin: “I can save 30% token used if i use JIT context priming. It will slow the completion time by 7s. Do you want me to do that?”
etc
Hmm… @cursor_ai’s planning mode is not good enough… yet.
I still need to use my own planning SOP…
But the past 2 releases have increased agentic dev’s self-control so much that I barely need to tend a 40 minute dev task. 🔥
@benln@benln Do any of the devs at the London Cursor cafe work on the agentic code generator size of the product? Would love to speak to someone about the changes you’ve made over the last few weeks and where you’re heading.
@ctrl_alt_focus@obsdmd All good. I suspected it may be a life happening situation. Let me look at some alternatives - and loop back with a contribution if it is still the best way forward. 😀👍
@swardley@elonmusk 🤔
If NASA hadn’t signed a contract with SpaceX… it would have gone bust.
That could be viewed as a bailout.
Equally NASA needed a service and tendered a contract.
That could be viewed as a vendor supplying what the customer needed.
1 event, 2 interpretations - neither untrue