Braintrust @braintrust - Twitter Profile

Production traces capture where your AI falls short and what users are trying to do. Building evals with that data is how you catch failures earlier and decide what to ship next. Braintrust is leading a workshop on how to: - Use the patterns Braintrust surfaces automatically - Turn them into a labeled eval dataset - Run the same workflow every time a new pattern shows up

braintrust's tweet photo. Production traces capture where your AI falls short and what users are trying to do. Building evals with that data is how you catch failures earlier and decide what to ship next.

Braintrust is leading a workshop on how to:
- Use the patterns Braintrust surfaces automatically
- Turn them into a labeled eval dataset
- Run the same workflow every time a new pattern shows up

2

1

0

113

Braintrust

@braintrust

1 day ago

Read more → https://t.co/slGEfkJ3ZD

0

1

0

181

Braintrust

@braintrust

1 day ago

Vibes-based testing and manual review don't scale. Automated evals are easy to set up and can make an immediate impact on AI development speed. Learn about three automated approaches to get started quickly with evals: LLM judges, heuristics, and comparative evals.

braintrust's tweet photo. Vibes-based testing and manual review don't scale.

Automated evals are easy to set up and can make an immediate impact on AI development speed. Learn about three automated approaches to get started quickly with evals: LLM judges, heuristics, and comparative evals. https://t.co/dPJLYZZ6if

1

0

234

braintrust retweeted

Ankur Goyal

@ankrgyl

2 days ago

https://t.co/LPoal6Z2sw

2

61

7

88

74K

Braintrust

@braintrust

2 days ago

This is the next chapter of Braintrust: active observability. We work behind the scenes to find answers to questions before you have to ask them. Trace everything → https://t.co/KmbQs1DgGq

0

2

0

146

Braintrust

@braintrust

2 days ago

Topics is now GA on all plans. Continuously find the patterns worth investigating across your production traffic.

2

17

4

2K

Braintrust

@braintrust

2 days ago

Topics reconstructs conversational threads, runs the right model at the right cost, stores vectors for on-demand clustering, and surfaces the output in a UI built for humans.

braintrust's tweet photo. Topics reconstructs conversational threads, runs the right model at the right cost, stores vectors for on-demand clustering, and surfaces the output in a UI built for humans. https://t.co/TCiqD0AZ50

1

3

0

240

Braintrust

@braintrust

5 days ago

Read more → https://t.co/X9tKecAoPb

0

1

0

398

Braintrust

@braintrust

5 days ago

Loop can create and manage dataset snapshots, tag them with environments, and prompt you to save before making changes. Your AI agent handles dataset versioning so you can focus on building better evals.

braintrust's tweet photo. Loop can create and manage dataset snapshots, tag them with environments, and prompt you to save before making changes. Your AI agent handles dataset versioning so you can focus on building better evals. https://t.co/hnTYxOEGqm

1

5

0

483

Braintrust

@braintrust

6 days ago

Braintrust presented on these challenges at @aidotengineer Europe. Watch the full session → https://t.co/kuv5FfsC3A

0

10

1

7

3K

Braintrust

@braintrust

6 days ago

Most traditional enterprises gave responsibility for AI to their ML team, but the model providers own the data pipeline. What's left is prompt engineering, context management, distributed systems, and evals, which require a diverse set of teams to get right.

2

0

622

Braintrust

@braintrust

7 days ago

Thanks to @Redpoint and congratulations to all the companies included on the 2026 InfraRed 100.

0

3

0

208

Braintrust

@braintrust

7 days ago

Read more → https://t.co/11VKdUnfhC

0

186

Braintrust

@braintrust

8 days ago

Without validation of what good looks like, it's impossible to judge whether AI quality is improving or regressing. Human expertise turns production traces into golden datasets that improve over time.

braintrust's tweet photo. Without validation of what good looks like, it's impossible to judge whether AI quality is improving or regressing.

Human expertise turns production traces into golden datasets that improve over time. https://t.co/FDh5JwSEkR

2

4

1

0

324

Braintrust

@braintrust

8 days ago

@AleksaMiti1 Thank you for your kind words. Please visit: https://t.co/Zy2K3r4CGt And use the code: thank-you For some swag from our team.

1

0

78

Braintrust

@braintrust

8 days ago

@daRubberDuckiee Join us → https://t.co/g6iAm7gf8i

0

1

0

145

Braintrust

@braintrust

8 days ago

Most AI failures don’t appear in testing. They show up later in support tickets, vague feedback, and production traces that are hard to interpret. Braintrust's @darubberduckiee leads a workshop on using Topics to uncover those patterns, turn them into evals, and investigate regressions before they become bigger issues.

braintrust's tweet photo. Most AI failures don’t appear in testing. They show up later in support tickets, vague feedback, and production traces that are hard to interpret.

Braintrust's @darubberduckiee leads a workshop on using Topics to uncover those patterns, turn them into evals, and investigate regressions before they become bigger issues.

1

0

246

Braintrust

@braintrust

Last Seen Users on Sotwe

Trends for you

Most Popular Users