attention may be all you need to follow orders. but useful work often starts with the weird thing in your peripheral vision, where nobody told you to look. coding agents still feel stuck at "attention is all you have". solving this will unlock so much!
at first the fear was machines not obeying us. alignment papers everywhere. now the awkward part is noticing how perfect obedience makes them dumber. our prompts are the ceiling.
maybe the software engineer job is now less "cook the thing" and more "help build the thermometer, smoke alarm, and taste test so the agent knows when the thing is bad". once that feedback loop works, get out of the kitchen and let it cook.
software needed to be very configurable because bespoke was expensive.
@antirez's ds4 feels like the future. you look at the giant settings panel and wonder why sacrifice ux and perf if building the exact niche thing is much cheaper now?
consequence: discoverability gets harder!
all the "software engineers become taste/architecture/high-level design people" stuff feels like carving out a comfy, hopeful, human-only corner.
the job is nastier: keep asking what intellectual work got cheap enough to reliably delegate. whatever survives is your job now.
Codex: kept trying to design my in-app agent like it was a security incident waiting to happen. keep it boxed in, put the intelligence in code, etc.
Also Codex: loose in my repo in yolo mode, coding for hours and behaving like a responsible adult
@Steve_Yegge I suspect these orchestrators cannot be successfully turned into products. It will always be better to build your own, because they are a reflection of the way you think. You may even build a new one per project. You're reducing the gap between *your* mind and the computer.
I see a lot of complexity involved in making multiple agents work in parallel on the same repo (e.g., git worktrees).
An indie dev with a portfolio of projects/repos can easily skip that complexity, assigning one agent to work on each repo at a time. Parallelism without the pain.
LLMs good at tool calling are so much more effective at practical long horizon tasks than stronger models that are better oracles but poor at tool calling.
This is like how sufficiently intelligent chads, when resourceful, absolutely mog IQ-maxing virgins that hate using tools.
Of course, you might no longer need that feeling of familiarity if your silicon co-worker can provide definite answers to any product question by interacting with it directly, rather than relying on memory.
Wild.
An unexpected side effect of using coding agents: shipping so much, so fast, that you feel you're losing your grip on the product.
The rocket is changing so fast that very few parts of it feel like good old friends.
Recent acquaintances everywhere.
@parkerconrad said that the LLMs' ability to read is much more significant than their ability to write. The more I use coding agents, the more accurate this proves to be.
ML nerds, what are you waiting for to make attention work outside of the context window?
Why do LLMs need a hint for remembering what they already know?
@karpathy This needs an AI gatekeeper, something that let's actionable and clear queries through, but redirects others to a clarification chat with AI first.
Otherwise, it's an instant denial of service attack on the CEO.
I made this: https://t.co/BkPbg2s3Ry
@GergelyOrosz This is probably a consequence of what @karpathy mentioned recently: code is, in a lot of cases, better than prose as prompt, because it has higher information density. This works outside migrations too!
https://t.co/JR9fzQM0tj
Continuing the journey of optimal LLM-assisted coding experience. In particular, I find that instead of narrowing in on a perfect one thing my usage is increasingly diversifying across a few workflows that I "stitch up" the pros/cons of:
Personally the bread & butter (~75%?) of my LLM assistance continues to be just (Cursor) tab complete. This is because I find that writing concrete chunks of code/comments myself and in the right part of the code is a high bandwidth way of communicating "task specification" to the LLM, i.e. it's primarily about task specification bits - it takes too many bits and too much latency to communicate what I want in text, and it's faster to just demonstrate it in the code and in the right place. Sometimes the tab complete model is annoying so I toggle it on/off a lot.
Next layer up is highlighting a concrete chunk of code and asking for some kind of a modification.
Next layer up is Claude Code / Codex / etc, running on the side of Cursor, which I go to for larger chunks of functionality that are also fairly easy to specify in a prompt. These are super helpful, but still mixed overall and slightly frustrating at times. I don't run in YOLO mode because they can go off-track and do dumb things you didn't want/need and I ESC fairly often. I also haven't learned to be productive using more than one instance in parallel - one already feels hard enough. I haven't figured out a good way to keep CLAUDE[.]md good or up to date. I often have to do a pass of "cleanups" for coding style, or matters of code taste. E.g. they are too defensive and often over-use try/catch statements, they often over-complicate abstractions, they overbloat code (e.g. a nested if-the-else constructs when a list comprehension or a one-liner if-then-else would work), or they duplicate code chunks instead of creating a nice helper function, things like that... they basically don't have a sense of taste. They are indispensable in cases where I inch into a more vibe-coding territory where I'm less familiar (e.g. writing some rust recently, or sql commands, or anything else I've done less of before). I also tried CC to teach me things alongside the code it was writing but that didn't work at all - it really wants to just write code a lot more than it wants to explain anything along the way. I tried to get CC to do hyperparameter tuning, which was highly amusing. They are also super helpful in all kinds of lower-stakes one-off custom visualization or utilities or debugging code that I would never write otherwise because it would have taken way too long. E.g. CC can hammer out 1,000 lines of one-off extensive visualization/code just to identify a specific bug, which gets all deleted right after we find it. It's the code post-scarcity era - you can just create and then delete thousands of lines of super custom, super ephemeral code now, it's ok, it's not this precious costly thing anymore.
Final layer of defense is GPT5 Pro, which I go to for the hardest things. E.g. it has happened to me a few times now that I / Cursor / CC are all stuck on a bug for 10 minutes, but when I copy paste the whole thing to 5 Pro, it goes off for 10 minutes but then actually finds a really subtle bug. It is very strong. It can dig up all kinds of esoteric docs and papers and such. I've also used it for other meatier tasks, e.g. suggestions on how to clean up abstractions (mixed results, sometimes good ideas but not all), or an entire literature review around how people do this or that and it comes back with good relevant resources / pointers.
Anyway, coding feels completely blown open with possibility across a number of "kinds" of coding and then a number of tools with their pros/cons. It's hard to avoid the feeling of anxiety around not being at the frontier of what is collectively possible, hence random sunday shower of thoughts and a good amount of curiosity about what others are finding.