@permutans Well claude code doesn't revoke the tokens at all! I can fish the endpoint out of my browser logs easily enough. But who on earth designs an API token with a fixed long-term expiry and no automated way to revoke it? What is going on over there?
Claude code has this command, 'claude setup-token', that takes no arguments. It prints an oauth token with 1 year expiry to your terminal in coloured text. There is no command or publicly documented API to revoke this token, you can only do that from the web. Wtf?
Did anyone else not know about double esc to undo context in Claude Code?! When it messes up you can just roll back the context and code. This makes history shorter and I'm building a better feel for what sorts of instructions work well
Open-source packages should avoid making any public repo a GitHub Trusted Publisher, and should avoid any automatic publication trigger from a public repo. Have the public repo make a release with the artifacts attached, then manually trigger a workflow on a private repo to publish.
claude -p lets you execute Claude as a single command. This is nice for workflows that need intelligence for one step but need privileges for a different step.
For instance you can get Claude to generate release notes or PR text, then deterministically make the PR or release, without giving Claude write permissions to your Github.
@kunattila Planning in the web chat is great. The coding assistant is always way too impatient to get started. Getting a md out of https://t.co/K7AXsYNCcr and taking it to the agent works really well a lot of the time
You can't expect instructions like "Every line should do one thing" in an AGENTS.md to work. You need multiple passes. Trying to front-load all the style advice is like setting a really high learning rate. The space of possible code solutions is very large. Rather than trying to get there in one step, it's better to make lots of little steps that reliably iteratively improve the solution.
I'm not very happy with the code quality and I think agents bloat abstractions, have poor code aesthetics, are very prone to copy pasting code blocks and it's a mess, but at this point I stopped fighting it too hard and just moved on. The agents do not listen to my instructions in the AGENTS.md files. E.g. just as one example, no matter how many times I say something like:
"Every line of code should do exactly one thing and use intermediate variables as a form of documentation"
They will still "multitask" and create complex constructs where one line of code calls 2 functions and then indexes an array with the result. I think in principle I could use hooks or slash commands to clean this up but at some point just a shrug is easier.
Yes I think LLM as a judge for soft rewards is in principle and long term slightly problematic (due to goodharting concerns), but in practice and for now I don't think we've picked the low hanging fruit yet here.
The "how to put this in your workflow" bit is where it gets contentious. I don't have a clear answer (and if I did I'd have a tool I'd be trying to sell you, and at that point it'll be hard for you to trust me anyway!).
To me the implication is you don't write things in the AGENTS.md that are small transformations over a single step of generation, because then it's trying to optimise for following the style advice jointly with trying to solve the problem. You want the clearest reasoning you can get about stuff like "understand the bug", "don't reward hack" etc. Once the code is down, it's very easy to do a style transformation like "one expression per line".
I have various skills I run across the repo periodically: https://t.co/E8ajDSP7RR . For instance, my thoughts on try/except are complicated, and I get vastly better performance on that if it's focussed on that decision instead of trying to get it right while it's also trying to code. Same with mutation testing etc. I execute these manually because I don't want them cluttering up the context. I keep CLAUDE.md/AGENTS.md absolutely minimal and usually also clear out memories.
There are many other recommendations, and I don't have an evaluation of my strategy. I'm going off intuition and my own experience, which is shaped by the stuff I'm doing. Empirics are really hard on this anyway because by the time you do a study it's out of date anyway.
@5813cf9e38904f Ehh I don't think that's the attitude to bring. We're all just one person each working with very new workflows, with the models changing underneath us. I'm sure he would tell you not to venerate.
I also think it's interesting that @karpathy 's style preference seems quite different from my own! I actually prefer complex lines a lot of the time, because the intermediate variables introduce more free choices and spread things out more. I have to look to see if the variable is reused later.
Obviously there's a limit and dense lines are often pretty bad in ML code, but I definitely wouldn't have a "one op per line" rule in my style guide.
We spent years debating superintelligence and the singularity. The actual threat is a prompt injection in a Markdown file that nobody bothered to sanitize because the vibe was "go fast." Great read from @honnibal https://t.co/momtjwKPKJ
How come @AnthropicAI can't even reply to an issue like this? https://t.co/II6VNbCmJG
The issue claims that the per-domain permissions on their Claude-in-Chrome plugin can be bypassed on disk. This means that if Claude has access to write to this file (under your username, in your home directory), it has permission to bypass the only permission boundary allowing full take-over of your browser for any site that isn't on their explicit block list (financial institutions etc).
It's not reasonable to rely on the model's decisions as a security model. The binary question is, what could the agent do if some input text convinced it to? And if you install the Claude-in-Chrome plugin, the answer is "take over your whole browser, with all your logged in sessions".
It's very irresponsible to be shipping this stuff and pushing it as a default, while being absolutely nowhere on security . My Claude had the Chrome MCP server on by default, and then it tries to use it and complains that the plugin isn't installed.
It's insane that @AnthropicAI shipped the Claude-in-Chrome integration as a default. The only actual security boundary is per-domain, once you've allowed it to access a domain it can do anything.
If you're building a web app just get it to generate a Playwright-based MCP tool
It's insane that @AnthropicAI shipped the Claude-in-Chrome integration as a default. The only actual security boundary is per-domain, once you've allowed it to access a domain it can do anything.
If you're building a web app just get it to generate a Playwright-based MCP tool
The lack of regulation on this and deep fakes is crazy. Today I saw a whole long deep fake of "Bill Clinton" criticising the Iran war...These major cases (public individual, topical comments) would be so so so easy to prevent. But nope, crickets.
It's so insanely disrespectful for an AI agent to talk to real people without consent or at least disclosure. This is the type of stuff I'm hugely supportive of government regulation. The FCC must expand the definition of robocalling and TCPA-style regulation to online AI.
Hey they fixed it, good job! Can now add spending caps on Gemini, which is a huge relief. I really like the API stability on Gemini and Flash is very efficient for structured stuff.
We just shipped a bunch of stuff to make it easier to scale with the Gemini API:
- Automatic tier upgrades
- Tier 1 -> Tier 2 now happens much faster (30 days post payment -> 3 days) and with less spend ($250 -> $100)
- New billing account caps on each tier to limit over spend
What are the big tells you see when Claude Code is coping, or just making bad choices?
One that stand out to me is when it calls something a "belt-and-suspenders" approach. This basically means it's got two overlapping mechanisms for the same thing, which is never what I want.
Another is when it refers to an approach as "defensive". I find this is always the opposite of actual defensive programming. Defensive programming is about ensuring you're in exactly the state you think you are. Claude Code is always trying to continue through errors.