UpTrain @UpTrainAI - Twitter Profile

over 2 years ago

There have been numerous informal observations about prompt drifts in Large Language Models (LLMs), with the most notable case being GPT-4 showing signs of laziness, especially for coding tasks by the end of the previous year. Discussions on Twitter also hint at a decline in Claude Sonnet’s effectiveness over the past few days. Given the closed-source nature of these models, it's impossible to know what happens behind the scenes, and most often, these drifts go unnoticed until they get flagged by the community. Today, a structured approach to tracking these shifts in the model’s performance is lacking. So we at @UpTrainAI decided to undertake this as a community initiative to monitor prompt drift and identify any regressions systematically. You can check it out and learn more about our methodology here: https://t.co/rjP1ON4Ynd While building this out was fun, performance monitoring presents many challenges—notably how to do this efficiently (from a cost perspective) and yet get good, stable results. We made a slew of improvements to get the standard deviation down to acceptable levels while using GPT-3.5 and running on as few as 25 data points. Looking ahead, we plan to enlarge our benchmarking dataset as well as include additional models (ex: Claude 3).

SourabhAgr03's tweet photo. There have been numerous informal observations about prompt drifts in Large Language Models (LLMs), with the most notable case being GPT-4 showing signs of laziness, especially for coding tasks by the end of the previous year. Discussions on Twitter also hint at a decline in Claude Sonnet’s effectiveness over the past few days. Given the closed-source nature of these models, it's impossible to know what happens behind the scenes, and most often, these drifts go unnoticed until they get flagged by the community.

Today, a structured approach to tracking these shifts in the model’s performance is lacking. So we at @UpTrainAI decided to undertake this as a community initiative to monitor prompt drift and identify any regressions systematically. You can check it out and learn more about our methodology here: https://t.co/rjP1ON4Ynd

While building this out was fun, performance monitoring presents many challenges—notably how to do this efficiently (from a cost perspective) and yet get good, stable results. We made a slew of improvements to get the standard deviation down to acceptable levels while using GPT-3.5 and running on as few as 25 data points.

Looking ahead, we plan to enlarge our benchmarking dataset as well as include additional models (ex: Claude 3).

1

3

2

1

507

UpTrainAI retweeted

LlamaIndex 🦙

@llama_index

over 2 years ago

Open-source evaluation with @UpTrainAI! GenAI applications are complex and unpredictable, so you need to run evaluations to know whether the changes you make are improving your outcomes. Uptrain is a way to move beyond "vibes" based evaluation. Check out their guest blog post: https://t.co/i25r59L1us And our docs: https://t.co/Tx7areX7UD And their docs: https://t.co/tnTknFf2oK Their announcement tweet: https://t.co/TsXGMbG95P

llama_index's tweet photo. Open-source evaluation with @UpTrainAI!
GenAI applications are complex and unpredictable, so you need to run evaluations to know whether the changes you make are improving your outcomes. Uptrain is a way to move beyond "vibes" based evaluation.

Check out their guest blog post: https://t.co/i25r59L1us

And our docs: https://t.co/Tx7areX7UD

And their docs: https://t.co/tnTknFf2oK

Their announcement tweet: https://t.co/TsXGMbG95P

1

98

12

43

10K

UpTrainAI retweeted

Sourabh Agrawal @SourabhAgr03

over 2 years ago

We are excited to announce the @llama_index <> @UpTrainAI integration! It’s been months in making, but we wanted to deliver something of real value to our community. Evaluations are not just about computing a final score for your application but getting actionable insights on where things are going wrong and how to improve the performance. With this integration, you can evaluate all individual components of your RAG pipeline, such as retrieval, reranking, sub-query, etc. and get deep insights into where your LlamaIndex pipelines need improvements, all with a single line of code. At UpTrain, we are building the gold standard of LLM evaluations with high-quality scores that learn your preferences. • Evaluate different aspects of your application with 20+ preconfigured checks • A high degree of customisation allows you to modify eval prompts, choose evaluator LLM or create your own checks. • Experiment with prompts, LLMs, embedding models, RAG modules, etc. • Do root cause analysis to find failure modes and hidden patterns. and finally, • Interactive dashboards to visualise results and do side-by-side comparisons [More coming soon] Check out the blog: https://t.co/1gekXxhro1 Check out UpTrain: https://t.co/nRmptEjNWc It was great fun collaborating with the LlamaIndex team - @ravithejads @seldo @jerryjliu0! @shikha_xyz

SourabhAgr03's tweet photo. We are excited to announce the @llama_index <> @UpTrainAI integration!

It’s been months in making, but we wanted to deliver something of real value to our community. Evaluations are not just about computing a final score for your application but getting actionable insights on where things are going wrong and how to improve the performance.

With this integration, you can evaluate all individual components of your RAG pipeline, such as retrieval, reranking, sub-query, etc. and get deep insights into where your LlamaIndex pipelines need improvements, all with a single line of code.

At UpTrain, we are building the gold standard of LLM evaluations with high-quality scores that learn your preferences.

• Evaluate different aspects of your application with 20+ preconfigured checks
• A high degree of customisation allows you to modify eval prompts, choose evaluator LLM or create your own checks.
• Experiment with prompts, LLMs, embedding models, RAG modules, etc.
• Do root cause analysis to find failure modes and hidden patterns.

and finally,
• Interactive dashboards to visualise results and do side-by-side comparisons [More coming soon]

Check out the blog: https://t.co/1gekXxhro1

Check out UpTrain: https://t.co/nRmptEjNWc

It was great fun collaborating with the LlamaIndex team - @ravithejads @seldo @jerryjliu0!

@shikha_xyz

0

26

9

3K

UpTrainAI retweeted

langfuse.com

@langfuse

over 2 years ago

Evals are fast becoming one of Langfuse's most adopted features after core observability. When logging a lot of production usage to Langfuse, teams start layering model-based evals on top of the manual checks and reviews to scale their evaluation.

langfuse's tweet photo. Evals are fast becoming one of Langfuse's most adopted features after core observability.
When logging a lot of production usage to Langfuse, teams start layering model-based evals on top of the manual checks and reviews to scale their evaluation. https://t.co/scsqHsKnFg

1

11

2

1

1K

Who to follow

Sourabh Agrawal

@SourabhAgr03

Founder @ CombineHealth AI (YC backed) || Fixing healthcare reimbursements, one denial at a time Prev. @GoldmanSachs, @IITBombay

Shikha Mohanty

@shikha_xyz

Co-Founder CombineHealth AI (YC W23)

Nine

@Nine87577423

Credits for:@Trinityfate62

UpTrain @UpTrainAI

over 2 years ago

@rawert @SourabhAgr03 @shikha_xyz @MarcKlingen @MDeichmann 😎

0

1

0

40

UpTrain @UpTrainAI

over 2 years ago

UpTrain 🤝 @langfuse integration Now, you can seamlessly track the quality, latency and cost of your LLM applications, all in one place. Read more about it: https://t.co/CalystsCy6 Link to the tutorial: https://t.co/tEPzLsKyO8

UpTrainAI's tweet photo. UpTrain 🤝 @langfuse integration

Now, you can seamlessly track the quality, latency and cost of your LLM applications, all in one place.

Read more about it: https://t.co/CalystsCy6
Link to the tutorial: https://t.co/tEPzLsKyO8 https://t.co/FCRfDBFvFC

0

10

1

822

UpTrain @UpTrainAI

over 2 years ago

UpTrain v0.6.5 is out

0

3

0

165

UpTrain @UpTrainAI

over 2 years ago

Link for the integration tutorial: https://t.co/Pr08qzj6Ai

0

2

0

118

UpTrain @UpTrainAI

over 2 years ago

With @UpTrainAI 🤝 @anyscalecompute integration, you can now use open-source LLMs like Mistral 7B, Llama2 (7B, 13B, 70B, CodeLlama), etc hosted on Anyscale's endpoints to evaluate your LLM applications with UpTrain.

UpTrainAI's tweet photo. With @UpTrainAI 🤝 @anyscalecompute integration, you can now use open-source LLMs like Mistral 7B, Llama2 (7B, 13B, 70B, CodeLlama), etc hosted on Anyscale's endpoints to evaluate your LLM applications with UpTrain. https://t.co/XOO1ccBxAD

1

8

3

0

1K

UpTrainAI retweeted

The developersIndia Community @devsinindia

over 2 years ago

🚨 Clueless about the LLM ecosystem? Join us for an exciting session about LLMs, RAG & much more with @SourabhAgr03, CEO @UpTrainAI Full Announcement: https://t.co/i3fH4BgI4k

devsinindia's tweet photo. 🚨 Clueless about the LLM ecosystem?
Join us for an exciting session about LLMs, RAG & much more with @SourabhAgr03, CEO @UpTrainAI

Full Announcement:
https://t.co/i3fH4BgI4k https://t.co/paBlpZzjWh

0

10

7

0

532

UpTrain @UpTrainAI

over 2 years ago

A Chevy dealer's chatbot agrees to sell a Tahoe for $1! This is a classic example of jailbreaking through an LLM system and why an evaluation tool is needed Check out many such tidbits and more in our chat with @qdrant_engine here: https://t.co/VSadJjosVh

0

3

1

0

157

UpTrainAI retweeted

Qdrant

@qdrant_engine

over 2 years ago

🚀 Elevate your LLM game with another Vector Space Talk this week! Discover the intricacies of using LLM as a judge in evaluating applications with @SourabhAgr03, CEO & Co-Founder at UpTrain AI. 🤯 📅 Date: Feb. 8, 2024 🕒 Time: 5:00 pm CET 🌐 Link: https://t.co/gzm6QPpZJz

qdrant_engine's tweet photo. 🚀 Elevate your LLM game with another Vector Space Talk this week!

Discover the intricacies of using LLM as a judge in evaluating applications with @SourabhAgr03, CEO & Co-Founder at UpTrain AI. 🤯

📅 Date: Feb. 8, 2024
🕒 Time: 5:00 pm CET
🌐 Link: https://t.co/gzm6QPpZJz https://t.co/WsAT2EvCyI

1

11

1

2

1K

UpTrain @UpTrainAI

over 2 years ago

Check out the tutorial here: https://t.co/D5I0mxenTb

0

1

0

57

UpTrain @UpTrainAI

over 2 years ago

Customisation capabilities of UpTrain 🚀 We've made it easier for developers to customise the evaluation processes

1

2

0

124

UpTrainAI retweeted

Y Combinator

@ycombinator

over 2 years ago

.@UpTrainAI (YC W23) is a full-stack LLMOps platform to evaluate, experiment, monitor, and test LLM applications. It is open-source, enabling customization, and can be self-hosted to satisfy your data governance needs. https://t.co/bwJoY1tiLK

2

37

6

18

13K

UpTrainAI retweeted

Sourabh Agrawal @SourabhAgr03

over 2 years ago

Exciting news to start the day! @UpTrainAI has been featured in @ycombinator's Top Generative AI Startups 2024 🚀 We're excited to continue pushing the boundaries of generative AI and making a difference in the industry. 💪 Check out OSS here - https://t.co/LGsdR0ZN8u

SourabhAgr03's tweet photo. Exciting news to start the day! @UpTrainAI has been featured in @ycombinator's Top Generative AI Startups 2024 🚀

We're excited to continue pushing the boundaries of generative AI and making a difference in the industry. 💪

Check out OSS here - https://t.co/LGsdR0ZN8u https://t.co/Hrq4e2HvWg

0

3

1

735

UpTrainAI retweeted

Sourabh Agrawal @SourabhAgr03

over 2 years ago

🚀 It was great fun integrating SPADE, a novel framework for synthesizing LLM evaluations, with @UpTrainAI. Big shoutout to the authors: @sh_reya HaotianLi ParthAsawa @MadelonHulsebos YimingLin J.D. Zamfirescu @hwchase17 Will Fu-Hinthorn AdityaParameswaran @sirrice

1

12

5

4

3K

UpTrainAI retweeted

Adam / AJ Chan

@itsajchan

over 2 years ago

What a great experience collaborating with @SourabhAgr03 and the team @UpTrainAI on this blog post where we break down what you can do to evaluate your RAGs when building with vector databases and LLMs. The Uptrain team are A+ players and I'm so glad we met in 2023! If you're building RAG pipelines, I'd love to get your feedback! You can check out the blog post here: https://t.co/t19TdF4IbX

0

10

2

1

701

UpTrain @UpTrainAI

over 2 years ago

0

1

0

73

UpTrainAI retweeted

Sourabh Agrawal @SourabhAgr03

over 2 years ago

Starting 2024 with a b[ang]log 💥 We recently wrote a blog, in collaboration with the @weaviate_io team, on the power of Retrieval Augmented Generation (RAG) to overcome the limitations of Language Models!

SourabhAgr03's tweet photo. Starting 2024 with a b[ang]log 💥

We recently wrote a blog, in collaboration with the @weaviate_io team, on the power of Retrieval Augmented Generation (RAG) to overcome the limitations of Language Models! https://t.co/LXKhKgygg4

3

23

9

11

3K

UpTrain

@UpTrainAI

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users