I’ve been building centralized systems where evals are reinforced by traces for this reason. Claude/Codex get you a long way, but the employee customization is just a new way that processes get obfuscated over time.
How do you turn agent traces into an improvement flywheel?
Excited to share Insights Generator (IG) — new @scale_AI / @ScaleAILabs research that finds behavioral patterns and bugs in agent traces.
Engineers & coding agents using IG achieved 30+% gains on agent benchmarks.
🧵