Judgment Labs @judgmentlabs - Twitter Profile

Pinned Tweet

Judgment Labs

@JudgmentLabs

24 days ago

Today is a special day.

Alex Shan

@alexshander03

24 days ago

We’re launching @JudgmentLabs today and announcing $32M in funding. As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world. Judgment builds infrastructure for improving AI agents from production data.

212

1K

157

355

4M

33

303

35

73

2M

JudgmentLabs retweeted

Shun

@22uenos

8 days ago

Had a lot of fun making these visuals :) Made in @paper with Claude

6

130

7

145

66K

Judgment Labs

@JudgmentLabs

8 days ago

The core idea: long-horizon agent evals should be done by agents, not simple LLM judges. Agent Judge searches trajectories, verifies stateful actions, and adapts rubrics from production feedback. https://t.co/ntsT7agNuF

1

28

2

4K

Judgment Labs

@JudgmentLabs

8 days ago

We built Agent Judge to evaluate long-horizon agents. As agents take on longer tasks, the evidence needed to evaluate them gets buried across tool calls, retries, logs, database updates, and final outputs. Evaluating these agents requires investigating the trajectory, not just judging the final answer.

JudgmentLabs's tweet photo. We built Agent Judge to evaluate long-horizon agents.

As agents take on longer tasks, the evidence needed to evaluate them gets buried across tool calls, retries, logs, database updates, and final outputs.

Evaluating these agents requires investigating the trajectory, not just judging the final answer.

13

65

15

31

72K

Judgment Labs

@JudgmentLabs

8 days ago

Agent behavior changes as models, tools, products, and user workflows change. That means the rubric used by the judge has to improve from production data, so it keeps evaluating the behaviors that matter. Rubric Builder turns feedback into concrete rubric updates.

JudgmentLabs's tweet photo. Agent behavior changes as models, tools, products, and user workflows change.

That means the rubric used by the judge has to improve from production data, so it keeps evaluating the behaviors that matter.

Rubric Builder turns feedback into concrete rubric updates. https://t.co/5woijOkm25

1

20

2

0

540

Judgment Labs

@JudgmentLabs

9 days ago

Thrilled to be recognized by @Redpoint as one of the most promising private AI Infrastructure companies. More exciting news to come!

JudgmentLabs's tweet photo. Thrilled to be recognized by @Redpoint as one of the most promising private AI Infrastructure companies.

More exciting news to come! https://t.co/z8AHK4rytf

Redpoint @Redpoint

9 days ago

The Redpoint InfraRed 100 is now live. These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on. Congratulations to this year's honorees! Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://t.co/Y1y94ZwI5B

Redpoint's tweet photo. The Redpoint InfraRed 100 is now live.

These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on.

Congratulations to this year's honorees!

Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://t.co/Y1y94ZwI5B

20

296

50

226

138K

3

32

7

2

3K

JudgmentLabs retweeted

Judgment Labs

@JudgmentLabs

18 days ago

@tbpn thanks for having our team on! P.S. it's Judgment 🧡

7

60

11

4

256K

Judgment Labs

@JudgmentLabs

18 days ago

@tbpn thanks for having our team on! P.S. it's Judgment 🧡

7

60

11

4

256K

JudgmentLabs retweeted

Enyu

@0xhappier

19 days ago

@JudgmentLabs

4

37

5

2

1K

Judgment Labs

@JudgmentLabs

20 days ago

Scoops will start flying @ 1pm (500 Marina Blvd) Tomorrow @Baytobreakers Finish Line The Flavors: - Fudgement @JudgmentLabs - Berry Brex-fast @brexHQ - Modal Green Tea @modal - Claude au Lait @claudeai - Vercel Road @vercel - Ando Apple Pie @andocorporation

Judgment Labs

@JudgmentLabs

20 days ago

Summer of Judgment! 2 days left for free ice cream...

6

69

10

6

314K

4

30

5

3

3K

Judgment Labs

@JudgmentLabs

20 days ago

Summer of Judgment! 2 days left for free ice cream...

6

69

10

6

314K

JudgmentLabs retweeted

Philip Kiely

@philipkiely

21 days ago

Great inference requires a great model Great models require great data Great data requires capturing what actually happens in production Enjoyed chatting with the @JudgmentLabs team about everything from agents to GTM strategies (ice cream is surprisingly high ROI)

philipkiely's tweet photo. Great inference requires a great model

Great models require great data

Great data requires capturing what actually happens in production

Enjoyed chatting with the @JudgmentLabs team about everything from agents to GTM strategies (ice cream is surprisingly high ROI) https://t.co/DTSUy0ryPZ

2

40

3

4K

JudgmentLabs retweeted

Emily Lonetto

@EmilyLonetto

21 days ago

Naturally Casper needed to check out @JudgmentLabs in South Park

1

19

1

0

680

JudgmentLabs retweeted

Brex

@brexHQ

21 days ago

@alexshander03 @lightspeedvp Brex-fast is the most important meal of the day

2

18

2

0

2K

Judgment Labs

@JudgmentLabs

22 days ago

what handsome guys

Aaron Makelky

@theaaron

22 days ago

only in SF: a @JudgmentLabs wrapped van handing out free ice cream & the flavors are named after the tech companies they parked it in front of @0xhappier