Claude doesn't understand rg.
It used `rg -rn X ...` which replaces all of "X" with "n".
Then it tried to explain how it understands everything, while literally chasing ghosts.
It should probably go and learn from composer.
Upset about GitHub's recent enshitification?
Cursor has been acting up in diverse ways 3 days in a row (different each day), to the point of unusability...
I wrote a harness that introduced its own DSL with algorithmic agent definitions as first-class citizens.
Language is interpreted by an agent. Works but waiting 2 minutes for a few lines of code is a bit much.
Time to try out @resonatehqio (cc @AndaristRake, its time!)
Claude is disappointingly bad at GIT-FU.
Problem: Find the latest upstream rebase SHA for our runtime forks.
Results:
* ChatGPT comes up with the correct solution and fast. Proposes simple programs to do it.
* Claude doesn't get there at all.
Of course swe-bench was part of Claude's training. If that is not a significant indicator of how meaningless swe-bench is these days I don't know what is🤓
Schroedinger's Claude: Understanding everything, but also absolutely not, at the same time.
> "OK. Now I'm clear on the full chain. Let me trace the exact paths:"
2026 prediction:
Agentic coders will need to become experts in problem decomposition.
Those who try to solve everything in a single prompt will not thrive.
Hot take: `Extended thinking` pollutes the context window.
Example:
* Ask Claude Opus 4.5 to generate some specific code.
* It generates a `thinking` event with *all* the code.
* Then it generates an exact duplicate of that very same code in a separate `thought` event.
Yesterday, I asked @AndaristRake if he has a suggestion as to how to raise a TypeScript error if some function does not enumerate all relevant combinations of a union.
Today, I wake up to a PR for my code and a PR for the typescript compiler (https://t.co/n5DONxipE6) 😁👏
@XiangruTang Love to see it. Fault/code/feature localization is one of the big unsolved problems and its not getting enough attention. Also agreeing with @acrognali about the benchmark. But, are you going to share `Loc-Bench`? I did not see a link/reference in the paper.
📣 News: 3 new *big* `Code Agent` competitors appeared just this week, yet none of them can accurately tell me how my code executed 🤔
* Claude Code: https://t.co/dICOANAwey
* Copilot Agents: https://t.co/cl3hxVdifv
* Gemini Code Assist: https://t.co/MoLcMcCIsZ
Trying to learn about GPRO vs. critic model-based (e.g. PPO) policy optimization. I asked both 4o and DeepSeek (online version).
1. DeepSeek started out with more concise explanations.
2. But when asking for examples, DeepSeek chokes, while 4o provides examples no problem.
@kevinkern 1. Its pretty cool that Gemini already has planning built into their web offering. Thanks for sharing. Gpt and everyone else will probably/hopefully offer that as an option as well.
2. Note that 4o has tools and web access. o1 is supposed to also get that, soon™
Prediction: In 2025, coding agents and assistants like OpenHands (@allhands_ai), Devin (@cognition_labs), @cursor_ai (and many more) will start using traditional debuggers. Not sure how well that will go🤔