I went from defusing bombs in the Navy to helping enterprises not blow up their AI deployments.
Now: GM at Tribe AI (@tribe_ai).
Writing about AI security, enterprise risk, and why most "AI strategies" are just slide decks.
Builders and operators, my DMs are open.
Elon Musk codified it at Tesla: automation is the last step, because you can't automate a process that hasn't been questioned and simplified first. AI coding tools make step 5 feel like step 1. Founders are paying for that inversion with customer conversations.
The first AI layoffs aren't internal employees. They're BPO contracts. Goldman Sachs and Barclays each run 30k+ offshore workers in India. Next-gen AI service companies are targeting those budgets first. Internal headcount comes later.
Claude Code's TodoWrite required reminders every 5 turns for early models. Stronger models read them as rigid obligations, not guidance. Performance dropped. The team replaced it. A tool that helped at one capability level was actively constraining at the next.
OpenAI's Codex team hit 3.5+ PRs per engineer per day. Human review can't maintain architecture at that throughput. Their fix: custom linters that encode invariants and validate dependency directions. Enforcement by machine, not negotiation in PR comments.
Build vs. buy assumes the thing needs to exist. That's the mistake. Right order: delete it? If not, buy cheap SaaS? Only then: build? AI coding tools made building frictionless. That makes the delete question more important, not less.
Gokul Rajaram scores AI-era durability by moat count: 4+ out of 8 and you're secure. Atlassian has data, workflow, ecosystem, network. Score around 4. https://t.co/zwZFO9nWdu has workflow only. Score 1. Both stocks down 75%. Different actual exposure.
Apideck logged a 43-tool MCP setup consuming 143k of 200k tokens before the agent ran a task. Anthropic rebuilt the same setup with CLI wrappers: 2k tokens. The MCP schema tax scales linearly with tool count. CLI invocations pay only at call time.
Anthropic's agent eval guidance: grade each quality dimension with a separate LLM judge. Strong LLM judges hit 80-90% agreement with human evaluators. Providing example inputs for each score level stabilizes calibration, similar to rater training. Isolation is what makes model-as-judge reliable.
API key distribution for AI tools breaks down at scale. Keys get shared, audit trails show only aggregate usage. Identity-baked-in gateways (Okta, Azure AD) fix this by tying every request to a user at the network layer. Role-based model access and instant revocation become possible.
Claude Code started with RAG for context. The team replaced it with Grep, letting Claude search and build context itself. Performance improved. A year later, Claude does nested searches across multiple file layers to find exactly what it needs. Model capability changes what the right tool looks like.
The Claude Code team described auto-memory as 'barely net positive.' The core issue: it sometimes recalls things incorrectly. That unpredictability forces users to verify everything the agent remembers before acting. The verification overhead cancels the efficiency gain.
Microsoft analyzed 1,200 sites: Copilot converts at 17x the rate of traditional search, Perplexity at 7x. Adobe: AI traffic drives 32% better revenue per visit. When someone clicks through from ChatGPT, they've completed the decision process inside the tool. Pure bottom-of-funnel.
Most agents in production run under shared service accounts. When a breach happens, no one can tell which agent caused it. Each agent in production needs a unique machine identity, a central registry, and a kill switch. That's what makes an audit trail possible.
SWE-agent documented agents thrashing: broad searches that lost track of goals, context filling with noise until progress stopped. Their fix: cap tool output at 50 matches. One design decision, one of the highest-leverage changes in the paper.
79% of orgs running AI agents have governance gaps. The bottleneck is accountability. Business defines use cases but skips risk appetite. Security sets policy but not controls. Platform ends up holding the bag. Shadow AI fills the rest.
AI coding agents make Step 5 feel like Step 1. Musk's order: question requirements, delete steps, simplify, accelerate, then automate. Coding tools are so fast that founders skip straight to build. That's automated procrastination, not progress.
Two agents both solve the task. One takes 4 steps and 8 seconds. The other takes 6 steps and 14 seconds. That's 75% more latency for the same result. Correctness is table stakes in model selection. Measuring ideal trajectories is how you actually compare.
A real deployment with 43 tools consumed 30k-100k tokens in schema overhead before the agent ran a single task. Apideck documented one case where MCP ate 143k of a 200k context window. The agent's reasoning budget is mostly gone before it starts.
Goldman and Barclays each run 30k+ offshore. AI services are targeting those BPO budgets first: same quality, 20-30% cheaper. Internal headcount is second. Layoffs are third. The sequencing matters for how orgs model AI's labor impact.
Build vs. buy has a third option: delete. Most workflows don't need to exist in the first 12-18 months. Manual at 70% is often good enough pre-PMF. The question founders skip: does this workflow need to exist at all?