@MatthewBerman There's something to be said about quality over volume. The gap may close over time, but if there's one thing I'll grant incumbent engineers, it's that elegant software architecture is not going to be a strong suit of agents limited to holding ~50KB at a time.
@MatthewBerman@ThePrimeagen@ThePrimeagen is one of those guys I occasionally check in with for an idea of what a non-trivial percentage of engineers are probably thinking.
He's not who I go to for the kind of vision of what tomorrow is about to look like.
@rezoundous For large scale workloads that would otherwise span weeks when you need it in days, or months when you need it in weeks, yes. There are some rare cases when durability may be the requirement.
Stakeholder: "Why is it so complicated?"
*20% error rate is manageable if you have validation layers, review workflows and controls between AI output and your books. It’s not manageable without them. But the answer here isn’t to avoid AI in accounting. It’s to be deliberate about where the model sits in the workflow. There’s a meaningful difference between AI that drafts and surfaces, sitting inside a system with deterministic validation, audit trails and exception handling built in as core features, versus AI bolted onto a legacy system or accessed raw through an API with none of that infrastructure around it. *
https://t.co/vSAHg51yEy
🤯BREAKING: Alibaba just proved that AI Coding isn't taking your job, it's just writing the legacy code that will keep you employed fixing it for the next decade. 🤣
Passing a coding test once is easy. Maintaining that code for 8 months without it exploding? Apparently, it’s nearly impossible for AI.
Alibaba tested 18 AI agents on 100 real codebases over 233-day cycles. They didn't just look for "quick fixes"—they looked for long-term survival.
The results were a bloodbath:
75% of models broke previously working code during maintenance.
Only Claude Opus 4.5/4.6 maintained a >50% zero-regression rate.
Every other model accumulated technical debt that compounded until the codebase collapsed.
We’ve been using "snapshot" benchmarks like HumanEval that only ask "Does it work right now?"
The new SWE-CI benchmark asks: "Does it still work after 8 months of evolution?"
Most AI agents are "Quick-Fix Artists." They write brittle code that passes tests today but becomes a maintenance nightmare tomorrow. They aren't building software; they're building a house of cards.
The narrative just got honest: Most models can write code. Almost none can maintain it.
This is genuinely interesting. Replaces a lot of old apps that have been difficult to find or as easily configure:
- default: 80% screen brightness from sunrise to sunset. 40% screen brightness from sunset to sunrise, and 80% media volume.
- Whenever I am within the vicinity inside a theater at GPS location 📌 dim phone to 10% and turn notifications off.
- whenever I reach the office, set brightness to 60% and notifications on vibrate only.
@MatthewBerman And It’s only the beginning. This is when meditation will be crucial for helping us ground ourselves. We haven’t evolved to process all this information so quickly.
@MatthewBerman Primagean shared his thoughts on YouTube on just this phenomenon recently. Just pulled yet another all nighter myself, racing to get the next build done.
I believe openclaw provided the right formula. Customizable experience. Persistence and ownership over local deliverables reframing as collaborative personal advocate, doer and coach, not a tool. In a word, the re-embodiment of Personal Computing, restored and directionally opposite of cloud centralization that often takes agency away from consumers. This actually gives some back.
@MatthewBerman Ever consider two Claude Max accts on separate email addresses? May sound ridiculous, but I wonder how the math shakes out vs relying on API spend.