It's still incredible that the best coding models use terrible coding practices at times. Both GPT 5.5 and Opus 4.8 often guess at fallback values. Here's an example output from GPT 5.5 (~25 LOC). The LiveKit docs (which GPT 5.5 has access to via MCP) show that these values can be accessed through one key each. So this can be reduced to 3 lines.
Always a good reminder to slow down and review the code.
We've seen many examples of AI agents doing terrible things like deleting production databases. I call this the "Murphy's Law of AI" - what an agent can do, it will do.
This raises the important question of "what should an AI agent be allowed to do?" However, this misses the important nuance that AI agents are acting on behalf of a specific individual (or company). The better question is "what should an AI agent be allowed to do for {x} individual?"
The answer is that it depends on the situation. There are three types of agent access you should think about:
- User delegated = the agent inherits the user's permissions. This is best when the agent is not shared with others and acts as a personal assistant.
- Agent owned = the agent has it's own permissions, essentially acting like a unique employee. This is best when the agent performs tasks in the background (e.g. cron jobs)
- User/Agent intersection = the agent receives a union of its permissions and the user's permissions. This is best when an agent is shared across users. E.g. a legal research agent where each lawyer has access to different cases.
I break this down further in the article.
All of these new layers of agentic programming (loops on top of agents, nested sub-agents) feel kind of like exotic derivatives.
Complex - no one knows exactly what they are doing
Illiquid - few people have adopted them
Pricing difficulty - you stand to burn a lot of money/tokens and the underlying value is difficult to predict
I think there are two different models:
1) Legal product (like contract review) with human (lawyer verification)
2) Law firm with which uses AI tools
The first treats lawyers as fungible. The second treats AI as fungible.
You can tell which one you're dealing with by checking whether the website shows the partners at the firm.
Pulled the trigger today and switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models.
Saves us millions of $ and we're actually seeing an *increase* in performance on many core use cases. Transformative for the business.
I've been complaining about this too! It's gotten really bad recently. I've had to report dozens of email domains as phishing, but @gmail doesn't automatically block or send these emails to spam even after I report them.
I've had to set up automatic rules to delete these emails because I was getting dozens a day. These emails are so easy to identify as spam!
The new Google "Modern Web Guidance" is a joke. The user experience "mega skill" is a single skill file with:
- 8 guides all with broken links
- 3 of the 8 guides are related to scrollbars...
These are the most random UX guides ever. I don't even think you could prompt Claude or Codex to write a skill this bad.
https://t.co/2xQ4U2RJw8
In my experience it's a "tragedy of the AI commons". Our competitors pitch our customers that they have fantastic new AI capabilities (real or not doesn't matter) and so we have to do the same.
Internally, it's hard to tell whether using tools for coding and other activities provides an advantage. IMO, a big difference between a staff engineer and a junior engineer is that a staff engineer has had to live with the consequences of their coding decisions for over a year. It's too early to say whether companies are truly accelerating their engineering or they are creating a "slop bomb" that will bite them much later.
For the haters in the comments, you can just one shot the agent swarm using Wispr Flow to pipe into Claude mobile connected to your Mac mini through Claude remote control. You could even run a local hosted open source model if you're concerned about cost. Just connect that to your pi agent and add tools for image parsing and generation.
@chamath Or have @jason setup an OpenClaw agent to run this analysis overnight for $400 in API costs. Probably best to add your credit card credentials in case your Claw needs to use external API services