We’re launching @JudgmentLabs today and announcing $32M in funding.
As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world.
Judgment builds infrastructure for improving AI agents from production data.
The core idea: long-horizon agent evals should be done by agents, not simple LLM judges.
Agent Judge searches trajectories, verifies stateful actions, and adapts rubrics from production feedback.
https://t.co/ntsT7agNuF
We built Agent Judge to evaluate long-horizon agents.
As agents take on longer tasks, the evidence needed to evaluate them gets buried across tool calls, retries, logs, database updates, and final outputs.
Evaluating these agents requires investigating the trajectory, not just judging the final answer.
Agent behavior changes as models, tools, products, and user workflows change.
That means the rubric used by the judge has to improve from production data, so it keeps evaluating the behaviors that matter.
Rubric Builder turns feedback into concrete rubric updates.
The Redpoint InfraRed 100 is now live.
These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on.
Congratulations to this year's honorees!
Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://t.co/Y1y94ZwI5B
Great inference requires a great model
Great models require great data
Great data requires capturing what actually happens in production
Enjoyed chatting with the @JudgmentLabs team about everything from agents to GTM strategies (ice cream is surprisingly high ROI)
only in SF: a @JudgmentLabs wrapped van handing out free ice cream & the flavors are named after the tech companies they parked it in front of @0xhappier