Good observability is a drug.
We’re using Sentry + Axiom now.
Sentry is wild for engineering observability, especially with OpenTelemetry auto-correlation. You can go from “something broke” to the exact trace/error/user/session very fast.
Axiom is the business/event layer for us: runs, missions, tool calls, approvals, user behavior, workspace-level activity.
For agentic products, this is not nice-to-have. Without observability, every issue looks like “AI is unreliable.”
With observability, it becomes: token expired, tool failed, model drifted, user changed prompt, approval blocked, API rate-limited, workflow recovered.
Different game.