Bay to Breakers is the city's biggest footrace...
And Judgment took it over. ๐
30 of us. 7.46 miles.
This is the Summer of Judgment.
Text "JL" to (628) 888-7594 to get on the list for our next event.
Bay to Breakers is the city's biggest footrace...
And Judgment took it over. ๐
30 of us. 7.46 miles.
This is the Summer of Judgment.
Text "JL" to (628) 888-7594 to get on the list for our next event.
Our internal data shows Claude is accelerating AI developmentโa possible path to recursive self-improvement, or AI autonomously building a more capable successor.
Itโs happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
We built Agent Judge to evaluate long-horizon agents.
As agents take on longer tasks, the evidence needed to evaluate them gets buried across tool calls, retries, logs, database updates, and final outputs.
Evaluating these agents requires investigating the trajectory, not just judging the final answer.