TensorZero @TensorZero - Twitter Profile

Pinned Tweet

2 months ago

We’re building TensorZero Autopilot, an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. It dramatically improves the performance of LLM agents on every single benchmark we’ve tried. Read more below.

TensorZero's tweet photo. We’re building TensorZero Autopilot, an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests.

It dramatically improves the performance of LLM agents on every single benchmark we’ve tried.

Read more below. https://t.co/v3EROx98CQ

1

34

6

7

9K

TensorZero retweeted

Michelle

@michellehui

about 7 hours ago

it was a full house last night! (pun intended) inaugural @TensorZero ai poker night co-hosted with @BessemerVP

0

4

2

0

95

TensorZero retweeted

Gabriel Bianconi

@gabrielbianconi

about 7 hours ago

🃏 Great time last night at @TensorZero's inaugural AI Poker Night co-hosted with @BessemerVP!

0

1

0

83

TensorZero @TensorZero

17 days ago

https://t.co/w5R9npHQcl

0

117

TensorZero @TensorZero

17 days ago

You might be overpaying 5.3x+ for Claude Opus 4.7! Our CEO @gabrielbianconi found out that on tool-heavy workloads, you're paying 5.3x more for Claude Opus 4.7 than GPT 5.4. The common metric is to compare cost per million tokens. But different providers use different tokenizers, and we found the same input produces wildly different token counts. Read our blog below to find out how you are being charged and how to actually know what you're paying for. 👇

1

3

1

0

1K

TensorZero @TensorZero

22 days ago

https://t.co/RDr0bdPbut

0

101

TensorZero @TensorZero

22 days ago

LLM evaluators are often noisy and weakly correlated with real-world outcomes. Noisy evaluators have limited value for production decisions that hinge on judging a single output (e.g. guardrails). However, even (very) noisy evaluators can reliably tell you which agent is better on average, meaning they can still help you pick the best variant to deploy and improve it over time. Learn how ↓

TensorZero's tweet photo. LLM evaluators are often noisy and weakly correlated with real-world outcomes.

Noisy evaluators have limited value for production decisions that hinge on judging a single output (e.g. guardrails).

However, even (very) noisy evaluators can reliably tell you which agent is better on average, meaning they can still help you pick the best variant to deploy and improve it over time.

Learn how ↓

2

4

2

0

472

TensorZero @TensorZero

23 days ago

Michelle Hui is joining us with a focus on developer relations. She recently graduated from Cornell with BS & MS degrees in computer science, during which she organized large tech events, conducted ML research, and held product internships (Alphabet / Wing, UN). Welcome to the team, @michellehui!

TensorZero's tweet photo. Michelle Hui is joining us with a focus on developer relations. She recently graduated from Cornell with BS & MS degrees in computer science, during which she organized large tech events, conducted ML research, and held product internships (Alphabet / Wing, UN).

Welcome to the team, @michellehui!

2

18

2

1

3K

TensorZero @TensorZero

30 days ago

https://t.co/laYFFISKV4

0

144

TensorZero @TensorZero

30 days ago

"If your security relies on your code being obfuscated, you're telling yourself a fake story." Our CTO Viraj Mehta (@thebigmehtaphor) chats about AI scaling, open source, and being a technical founder following his PhD in Reinforcement Learning at CMU. In our in-depth conversation, we discuss: 🔸What to look for in engineering hires when coding is largely solved 🔸How AI has increased the hiring bar and what you can do to meet it 🔸Why you should build in Rust to make your code more secure and error-free 🔸Why open source is still an important business model 🔸 Why AI is still scaling... and there's continued "juice to squeeze" in AI Full conversation in the thread ↓

2

3

1

911

TensorZero @TensorZero

about 1 month ago

TensorZero is committed to open source. We sit down with our CTO Viraj @thebigmehtaphor. Takeaways: 1. Thinking closed-source code == security is a LIE. You still need to build from first principles and fundamentally secure code. 2. Open-source keeps more eyes on your code, ultimately finding more bugs and vulnerabilities. 3. Open-source code allows developers and their agents to understand your codebase and integrate with it better. Close-source hamstrings users to look only at documentation. @michellehui

2

3

0

228

TensorZero @TensorZero

about 2 months ago

https://t.co/F6j7Ei1UxG

0

13

4

1

1K

TensorZero @TensorZero

2 months ago

Learn more: https://t.co/76TPdgxoWz

0

2

0

212

TensorZero @TensorZero

2 months ago

Can an automated AI engineer autonomously debug and optimize an LLM pipeline in 5 minutes? Last night, ours did: it cut errors in ~half during its first live demo. TensorZero Autopilot (our automated AI engineer) analyzed hundreds of historical LLM traces to identify failure modes, tuned the prompt, and verified improvements with an LLM judge — autonomously, in <5 minutes. With more time, it can do much more: from model selection to fine-tuning to adaptive experimentation, TensorZero Autopilot dramatically improves the performance of LLM agents across diverse tasks. Learn more below ↓

TensorZero's tweet photo. Can an automated AI engineer autonomously debug and optimize an LLM pipeline in 5 minutes?

Last night, ours did: it cut errors in ~half during its first live demo.

TensorZero Autopilot (our automated AI engineer) analyzed hundreds of historical LLM traces to identify failure modes, tuned the prompt, and verified improvements with an LLM judge — autonomously, in <5 minutes.

With more time, it can do much more: from model selection to fine-tuning to adaptive experimentation, TensorZero Autopilot dramatically improves the performance of LLM agents across diverse tasks.

Learn more below ↓

1

7

4

3

3K

TensorZero @TensorZero

2 months ago

TensorZero Autopilot is powered by our open-source LLMOps platform that unifies an LLM gateway, observability, optimization, evaluation, and experimentation. The open-source project is used by companies ranging from frontier AI startups to the Fortune 10 and powers ~1% of the global LLM API spend today. https://t.co/8eOLetv0DY

0

1

0

1

210

TensorZero @TensorZero

2 months ago

We’re building TensorZero Autopilot, an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. It dramatically improves the performance of LLM agents on every single benchmark we’ve tried. Read more below.

1

34

6

7

9K

TensorZero @TensorZero

2 months ago

Read more about this work on the TensorZero Blog: https://t.co/76TPdgxoWz

1

2

0

1

216

TensorZero

@TensorZero

Last Seen Users on Sotwe

Trends for you

Most Popular Users