I've been calling this "thought-terminating flattery", and it's quite annoying:
Me: <adds random suggestion>
Claude: "that's a better frame than anything I've said." <proceeds to just explain what I suggested and stop>
Shit AIs say... "the issue is subtle: __.lint mutates, we should just check and include any edits then rerun verification, instead of downgrading to <other tool>"
I didn't realize downgrading verification steps was merely a "subtle" issue, haha.
@fwbrasil Scala "should" become a stronger choice in the age of agentic coding, as in it has a lot of choices which seem, to me, to "automate" rigor. Not sure if it's (the ecosystem/community) taking advantage of that enough.
AI has made math exploration infinitely more fun for me. I also find different LLM+"harness" systems have very different flavors for doing so, which is definitely the more interesting situation to be in.
Is the natural proofs barrier the most insane "skill issue" statement to date or what? They really went full Rick Sanchez with "we have a mathematical proof of your inadequacy".
@fwbrasil I have a guide/macro system for have the agent catch/verify subagent outputs because it hates doing that by default. Probably a poor man's version, but works.
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out.
I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really).
It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely.
The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture.
We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying.
I worry.
We asked a dozen DevTool founders from companies like @RevenueCat, @greptile, @firecrawl, @infisical, @ollama, @resend, @mintlify, @UnslothAI, @porterdotrun, and @recallai, about the state of AI agents and the future of software engineering.
In this episode of Founder FAQ, we covered everything from agents as customers and the end of coding, to advice for founders starting out and what they're most excited about going forward. Their answers might surprise you.
00:00 – Meet the Founders
03:00 – Building for Agents First
04:22 – Biggest Early Mistakes
07:15 – Do Founders Still Write Code?
09:22 – Most Unexpected AI Discoveries
12:09 – What's Underrated Right Now
14:38 – Predictions & What's Next
@garrytan Not saying your test suite is bad at all, but I find that AI almost always writes very questionable tests seemingly aiming to increase test case count. And the actual test code itself could also usually be ~30% lower even without culling useless/redundant cases.
We're open-sourcing Cua Driver - our new macOS driver that lets any agent (Claude Code, Codex, your own loop) drive any app in the background, with true multi-player and multi-cursor built-in.
1/8
@GergelyOrosz I have a hook auto-blocking rm -rf with a reason + allowable safe (soft) rm, so the agent can continue work but actually get to think a second time.
@garrytan I'm not sure that's exactly true. I feel like it made the weights of certain skills more extreme, which I suppose is still a change, but not super different.
Shit LLMs say... 8 parallel reads, best-case latency: 8x sequential read. Hilarious! ...until you realize it wasted the next 30k tokens with that assumption in sentence #0 😅