Evidently AI

@EvidentlyAI

Open source ML and LLM evaluation 📊 , testing 🚦and monitoring 📈 GitHub:

Joined February 2020

211 Following

2.5K Followers

2.3K Posts

Pinned Tweet

Evidently AI @EvidentlyAI

over 1 year ago

3️⃣ 2️⃣ 1️⃣ Our free course on LLM evaluations for AI product teams starts today! 🎥 7 days of byte-sized videos into your inbox ⭐️ Certificate upon completion 👩‍💻 No coding skills required 👩‍🎓500+ students have signed up You can still join the course👇 https://t.co/Go2bNYJXCR

EvidentlyAI's tweet photo. 3️⃣ 2️⃣ 1️⃣ Our free course on LLM evaluations for AI product teams starts today!

🎥 7 days of byte-sized videos into your inbox
⭐️ Certificate upon completion
👩‍💻 No coding skills required
👩‍🎓500+ students have signed up

You can still join the course👇
https://t.co/Go2bNYJXCR https://t.co/DB1Y60c4wg

3

7

1

5

2K

Evidently AI @EvidentlyAI

about 1 month ago

How Zalando builds a search quality assurance framework with LLM-as-a-judge: https://t.co/lLu13C28eK

0

0

0

0

91

Evidently AI @EvidentlyAI

2 months ago

📌 In case you missed it How to evaluate an AI agent? Follow the tutorial as we: 1️⃣ Build an AI agent, 2️⃣ Create a test dataset, 3️⃣ Assess responses and tool choice, 4️⃣ Track the agent’s behaviour. Follow the tutorial from our LLM evals course: https://t.co/lkoEhBBdGC

0

1

0

1

169

Evidently AI @EvidentlyAI

2 months ago

A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: https://t.co/jJoUj6MfFZ How Uber improves driver availability at airports: Estimated time-to-request model, Earnings-per-hour prediction, and Driver-deficit forecasting. https://t.co/c3fwqIduGx

0

0

0

0

127

Who to follow

MLOps Community

@mlopscommunity

The MLOps community is an open and transparent community where all are welcome to participate. It is a place where MLOps practitioners can collaborate and share

Verified account

Build and share machine learning apps in 3 lines of Python. Part of the @Huggingface family 🤗. DMs are open for sharing your gradio app with us for promotion!

Verified account

@aisysbooks @goodailist AI Engineering: https://t.co/94dv4uTU1H Designing MLSys: https://t.co/G81hL2dWmr Reading @chipslib

Evidently AI @EvidentlyAI

2 months ago

🦾 More AI agents aren’t always better. Google evaluated 180 agent setups and found multi-agent systems help with parallel tasks but can hurt sequential ones. The work also proposes a model to predict optimal agentic designs. https://t.co/ODbRtyGPui

EvidentlyAI's tweet photo. 🦾 More AI agents aren’t always better.

Google evaluated 180 agent setups and found multi-agent systems help with parallel tasks but can hurt sequential ones.

The work also proposes a model to predict optimal agentic designs.

https://t.co/ODbRtyGPui https://t.co/l2vdPc5ERx

0

1

0

3

89

Evidently AI @EvidentlyAI

2 months ago

📌 In case you missed it Let’s test your RAG system! Follow the tutorial as we: 1️⃣ Build a RAG system, 2️⃣ Generate test data, 3️⃣ Evaluate answers for correctness and faithfulness. Watch the tutorial from our LLM evals course: https://t.co/HuU5TWk0HZ

0

0

0

0

117

Evidently AI @EvidentlyAI

2 months ago

A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: https://t.co/jJoUj6MfFZ How GoDaddy built Lighthouse, an internal AI analytics platform: prompt engineering framework, model orchestration, solution architecture, and use cases. https://t.co/fil15hoXPi

0

0

0

1

73

EvidentlyAI retweeted

Nnenna 👩🏽‍💻✨

2 months ago

(policyNIM oss tool) preflight command is working. when I provide a coding task, it kicks off a search through indexed policies to determine which rules are relevant for implementation. @nvidia for embedding w/ @OpenAI + @lancedb for vector storage. eval command is also working. using @EvidentlyAI for running eval suite.

nnennahacks's tweet photo. (policyNIM oss tool)

preflight command is working. when I provide a coding task, it kicks off a search through indexed policies to determine which rules are relevant for implementation.

@nvidia for embedding w/ @OpenAI + @lancedb for vector storage.

eval command is also working. using @EvidentlyAI for running eval suite.

1

4

2

0

426

Evidently AI @EvidentlyAI

2 months ago

🚦 Meta’s “Agents Rule of Two” According to Meta, AI agents should satisfy at most two of these conditions per session to reduce prompt-injection risk: - Handle untrusted inputs - Access sensitive data - Change state / act externally https://t.co/Zdb6rHtj3i

0

0

0

0

49

Evidently AI @EvidentlyAI

3 months ago

📌 In case you missed it How do you know if your RAG works? You need to check: ✅ Can it find the right information? ✅ Is the final answer complete, relevant, and free of hallucinations? Watch the intro to RAG evaluation from our LLM evals course: https://t.co/e80MQr7ent

0

1

0

0

164

Evidently AI @EvidentlyAI

3 months ago

A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: https://t.co/jJoUj6MfFZ How DoorDash improves its RecSys using LLMs to bridge behavioral silos in multi-vertical recommendations. https://t.co/3bSxC7qPTG

0

0

0

0

59

Evidently AI @EvidentlyAI

3 months ago

💭 Can AI systems introspect? Anthropic’s new research suggests Claude models can sometimes identify and describe their own internal states. It’s still unreliable, but marks a step toward more transparent AI reasoning. https://t.co/hEhV9xBy87

0

0

0

0

46

Evidently AI @EvidentlyAI

3 months ago

📌 In case you missed it Can LLMs write engaging tech tweets? Follow the tutorial as we: 1️⃣ Build a tweet generator, 2️⃣ Score its outputs with custom LLM judges, 3️⃣ Improve the results with prompt iteration. Watch the tutorial from our LLM evals course: https://t.co/VsNXVdZNc6

1

2

0

3

172

Evidently AI @EvidentlyAI

3 months ago

A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: https://t.co/jJoUj6MfFZ How Shopify transformed its product classification system from basic categorization to an AI-driven framework using Vision Language Models. https://t.co/6gY2GtTY9v

0

2

1

1

71

Evidently AI @EvidentlyAI

3 months ago

📚 Context is everything. OpenAI shares how it built an in-house data agent that answers complex questions in minutes. It uses 6 layers of context: - Table metadata - Human annotations - Codex enrichment - Company knowledge - Memory - Runtime context https://t.co/vrjw4XDktt

0

1

0

3

113

Evidently AI @EvidentlyAI

3 months ago

📌 In case you missed it Are LLMs good for classification tasks? We built an LLM-based classifier for a travel support chatbot and compared its performance to a classic ML model. Watch the tutorial from our LLM evals course: https://t.co/6EayS9lThw

0

2

1

2

156

Evidently AI @EvidentlyAI

3 months ago

A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: https://t.co/jJoUj6MfFZ How Wayfair built Wilma, a customer service agent copilot: workflow, prompt templates, and the copilot’s evolution. https://t.co/LHLeOsMEFd

0

0

0

1

81

Evidently AI @EvidentlyAI

3 months ago

🤖 How to develop and deploy chatbots at scale? DoorDash shares how they created a simulation platform and evaluation flywheel, allowing them to test chatbots with fast feedback loops and without production risk. https://t.co/BS9bufAiXr

0

1

1

0

60

Evidently AI @EvidentlyAI

3 months ago

📌 In case you missed it How to create an LLM judge that aligns with human labels: - Define criteria - Create test dataset - Run evaluation prompt to see if the judge aligns with your labels - Evaluate the judge Watch the video from our LLM evals course: https://t.co/d3fe8a8yBY

1

1

0

1

169

Evidently AI @EvidentlyAI

3 months ago

A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: https://t.co/jJoUj6MfFZ How Wayfair uses AI agents to automatically triage support tickets: agents vs. workflows and a hybrid approach. https://t.co/pRigeuGbZx

0

0

0

0

81

Evidently AI @EvidentlyAI

3 months ago

🔎 Scaling catalog attribute extraction with multi-modal LLMs Instacart shares how it built PARSE, a self-serve multi-modal LLM platform for structured product attribute extraction from text and images at scale 👇 https://t.co/3CKzlFLhlD

0

2

1

1

122

Last Seen Users on Sotwe

Trends for you

Most Popular Users