managing autonomous AI agents is probably like managing lazy and disinterested human engineers (i've only managed awesome and good engineers so i don't know for sure)
Codex est'd token spend is $113 (API rates). Claude is $31 (from `/cost`).
I think Claude is the winner in this. Or at least the best loser? Clearly more aligned with my intent than Codex. Cheaper. Failed fast.
ok now that I've had coffee and can evaluate/review myself:
Codex was Goodharting- 2.43x improvement is actually "20s constant cached" and the memory use was actually higher, but a config change masked it.
Claude's 10% gain was honest but also behavior changing and failed tests
@h_thoreson prices are sorta starting to correct in Denver but it's still a "weak seller's market" - just so burned by the insanity of the last few years that "house is available for more than a few days" feels like a treat
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out.
I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really).
It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely.
The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture.
We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying.
I worry.
Production Haskell by Matt Parsons is on sale on Leanpub! Its suggested price is $40.00; get it for $22.50 with this coupon: https://t.co/0oka5foi4f @mattoflambda#haskell