@CryptoCurrentYT your latest, unusual, almost hour-long video was good! Way more emotional and hilarious than most of your videos that i've seen. Want to say huge THANK YOU for what you share week after week in your videos!
@odysseus0z Great insight!
two more metrics worth checking:
- Files with most churn and number of their dependencies. Reveals the real pain points.
- The structure of module dependencies: are they connected like a tree, or more like a dense net with almost all-to-all dependencies?
@Steve_Yegge to your main point, an alternative can be:
- allow ppl to propose startup ideas within the parent company
- parent company invest as seed/angel in the best ideas and the link btw the parent and the startup stops on it
- the rest is a more or less known blueprint for startups
@Steve_Yegge > as AI rewrites how you build software, the org [of teams] has to shift to match
💯 you can't just build software in teams as we used to for the last decade; we can't structure software like we used to in the last decade; libs ecosystem vs write your own and so on and so on...
@steipete@luongnv89 Do we still need GitHub issues and pull requests in 2026?
- Code review can run locally or in agent/cloud env.
- Issue tracking can live in ISSUES.md or Linear.
- CI/CD is already moving to tools like Blacksmith.
Feels like we're keeping GitHub workflows mostly out of habit.
Code complexity is still a problem — even with super capable AIs.
The winning strategy is not “use better agents.”
It’s:
Design systems where agents are forced to produce simple code.
If you don’t enforce simplicity, you won’t get it — no matter how smart the agent is.
If you use AI for coding, enforce this or regret it:
strict module boundaries
complexity budgets
“remove before add” rule
Coding agents don’t write simple code (yet). They follow incentives.
Set the wrong ones → you scale chaos.
AI generated code (and human generated as well) tend to increase complexity.
Because anyone who don’t have a real “mental model” of your system will:
- patch locally
- ignore boundaries
- stack abstractions
No architecture → entropy wins.
AI generated code (and human generated as well) tend to increase complexity.
Because anyone who don’t have a real “mental model” of your system will:
- patch locally
- ignore boundaries
- stack abstractions
No architecture → entropy wins.
Good documentation has value in all sorts of projects — more in some, less in others.
It is a shame we hide "skills" and "rules" under hidden `.agents/` and `.claude/` directories. Humans will still benefit from this documentation. For a while at least.
@nielstron@tibglo curious if you’re planning another iteration of “Evaluating AGENTS.md” paper with a setup closer to real-world conditions.
Great experiment, but feels a bit too filtered to generalize beyond the limited usefulness of autogenerated context files.
https://t.co/txeYnUgEHn
I wouldn't delete CLAUDE.md/AGENTS.md.
The paper “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” evaluates agents in a highly controlled, “lab-like” setup: tasks are clarified, missing tests are generated, and environments are normalized before benchmarking.
But the title asks a much broader question. The experiments are limited to a narrow setting: two benchmarks, mostly Python repos, single-sample runs, PR filtering, issue rewriting, and LLM-generated tests where none exist.
The paper says many repositories lacked suitable tests, so it used an LLM to generate them, then manually improved overspecified tests. That makes the benchmark useful, but it also creates room for auxiliary assumptions.
This is not the environment real engineering teams operate in.
The takeaway from the paper should not be to stop using autogenerated agent context files like CLAUDE.md/AGENTS.md.
Instead, I will put effort into making CLAUDE.md/AGENTS.md files useful for the projects I'm working on.
@nielstron@tibglo My bad, it was ambiguous.
Here's the difference:
- Do not create tests and do not modify repositories to help agents
- Do not use benchmarks
+ Humans, e.g. repository owners, evaluate the performance of agents (2 PRs with and without context files per issue, per agent+LLM)
@nielstron@tibglo Brainstorm: Take a set of open-sourced GitHub repos with context files and 3+ issues. Use Codex/Claude agents (different models) to resolve issues—with and without context. Compare results: fix success rate, steps taken, and cost.
I made a typo. I should have written:
"The takeaway from the paper: stop using autogenerated agent context files like CLAUDE.md/AGENTS.md."
But on second thought:
The takeaway: autogenerated agent context files like CLAUDE.md/AGENTS.md are worse than ones created manually.
I wouldn't delete CLAUDE.md/AGENTS.md.
The paper “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” evaluates agents in a highly controlled, “lab-like” setup: tasks are clarified, missing tests are generated, and environments are normalized before benchmarking.
But the title asks a much broader question. The experiments are limited to a narrow setting: two benchmarks, mostly Python repos, single-sample runs, PR filtering, issue rewriting, and LLM-generated tests where none exist.
The paper says many repositories lacked suitable tests, so it used an LLM to generate them, then manually improved overspecified tests. That makes the benchmark useful, but it also creates room for auxiliary assumptions.
This is not the environment real engineering teams operate in.
The takeaway from the paper should not be to stop using autogenerated agent context files like CLAUDE.md/AGENTS.md.
Instead, I will put effort into making CLAUDE.md/AGENTS.md files useful for the projects I'm working on.