a semi-nomadic software developer with an interest in ice hockey, craft beers, and tech in general. originally from pittsburgh. same handle on "all 3."
41% of all code shipped in 2025 was AI-generated or AI-assisted. The defect rate on that code is 1.7x higher than human-written code. And a randomized controlled trial found that experienced developers using AI tools were actually 19% slower than developers working without them.
Devs have always written slop. The entire software industry is built on infrastructure designed to catch slop before it ships. Code review, linting, type checking, CI/CD pipelines, staging environments. All of it assumes one thing: the person who wrote the code can walk you through what it does when the reviewer asks.
That assumption held for 50 years. It broke in about 18 months.
When 41% of your codebase was generated by a machine and approved by a human who skimmed it because the tests passed, the review process becomes theater. The reviewer is checking code neither of them wrote. The linter catches syntax, not intent. The tests verify behavior, not understanding.
The old slop had an owner. Someone could explain why temp_fix_v3_FINAL existed, what edge case it handled, and what would break if you removed it. The new slop has an approver. Different relationship entirely.
Arvid’s right that devs wrote bad code before AI. The part he’s missing: the entire quality infrastructure of software engineering was designed around a world where the author and the debugger were the same person. That world ended last year and nothing has replaced it yet.
🚨BREAKING: Alibaba tested AI coding agents on 100 real codebases, spanning 233 days each.
the agents failed spectacularly.
turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI collapses.
SWE-CI is the first benchmark that measures long-term code maintenance instead of one-shot bug fixes.
each task tracks 71 consecutive commits of real evolution.
75% of AI models break previously working code during maintenance.
only Claude Opus 4 stays above 50% zero-regression rate. every other model accumulates technical debt that compounds over iterations.
here's the brutal part:
- HumanEval and SWE-bench measure "does it work right now"
- SWE-CI measures "does it still work after 6 months of changes"
agents optimized for snapshot testing write brittle code that passes tests today but becomes unmaintainable tomorrow.
Alibaba built EvoScore to weight later iterations heavier than early ones. agents that sacrifice code quality for quick wins get punished when consequences compound.
the AI coding narrative just got more honest: most models can write code. almost none can maintain it.
@liz my MbP is an Intel '19, i've definitely noticed lots more lag in unlock and icon re-caching with macOS 26. but i do appreciate the extra level of greyscale i can have in the OS chrome. but i'll probably pull the trigger on a new one this year anyways.
Real nerds write _everything_ in markdown by default, regardless of whether it will be rendered, knowing that the target audience can be trusted to mentally render it while reading.
@SwarmApp also, not perf related, but i went to a sporting event recently and the “you’ve seen Team A x times and Team B y times” had the x/y values swapped.
@AndrewSolender 4 or 1 are my choices. if you're taking 5 with nobody in 4, you're a jerk blocking a seat, and if you're taking it with someone already there, most people lean away and give you space anyhow.