Vibbs Dod

@Vibbsdod

Problem Solver | Student of Life Checkout my blog

Joined June 2013

268 Following

61 Followers

297 Posts

Vibbs Dod

@Vibbsdod

14 days ago

@ClaudeDevs This is so messed up. Paying premium so that we have access to frontier models has no meaning i guess.

879

Vibbs Dod

@Vibbsdod

30 days ago

Opus 4.8 is here... No wonder 4.7 was behaving so badly since last couple of days. Lets run all the benchmarks again and see where we land with this one @AnthropicAI #opus

Vibbs Dod

@Vibbsdod

about 2 months ago

In Retrieval‑Augmented Generation, evaluating the retriever is non‑negotiable. I measured Context Precision@5 at 0.62 and still got hallucinations because the LLM was fed irrelevant docs. Good retrieval = better answers.

Vibbs Dod

@Vibbsdod

about 2 months ago

Automated metrics like BLEU or ROUGE are handy for quick checks, but they punish creativity. My summarizer once got a ROUGE‑L of 0.42 while users rated it 4.7/5 for usefulness. Don't let surface similarity dictate your success.

Who to follow

saicharan

@pogul_saicharan

building @roasterdotfun | running experiments in icm | member @SuperteamIN | prev @_axsproject

Anna Sims

@asimsmadeit

in pursuit of knowledge • AI safety research SPAR Fellow • (incoming) data science intern @jpmorgan • GSoC'25 • umich

Justin

@_JustinTime42

Building DevJourney in public from Alaska.

Vibbs Dod

@Vibbsdod

about 2 months ago

LLM‑as‑a‑Judge sounds clever until you let the model grade its own work. I tried using GPT‑4 to score GPT‑4 outputs and it consistently gave itself 4.5‑plus. Always use a different model or configuration as the judge.

Vibbs Dod

@Vibbsdod

2 months ago

Most LLM teams ship blind. I once pushed a prompt change after eyeballing three outputs. Production broke, and I had no baseline to blame the prompt, the model, or the retriever. If you can't measure, you can't fix.

Vibbsdod retweeted

Zain Shah

@zan2434

2 months ago

Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)

29K

25K

Vibbs Dod

@Vibbsdod

3 months ago

If you're building RAG, your cost model has at least 4 factors: 1. LLM generation cost (query volume x avg tokens x price/token) 2. Embedding cost (document count x avg tokens x price/token, plus re-embedding cadence) 3. Vector DB cost (storage + query volume) 4. Monitoring cost (flat fee per user per month) Most project proposals I've reviewed only include 1-2 factors only.

Vibbs Dod

@Vibbsdod

3 months ago

First comment: Built a free tool that shows you this trade-off across 12 models for your specific use case → https://t.co/sAUd8xP4Iv

Vibbs Dod

@Vibbsdod

3 months ago

The model decision and the cost decision are not separate. GPT-4.1: $2/M input, $8/M output GPT-4.1 Mini: $0.40/M input, $1.60/M output 5x cost difference. For most classification tasks: identical quality. "Use the best model" is not a cost strategy. "Use the cheapest model that meets quality requirements" is.

Vibbs Dod

@Vibbsdod

3 months ago

AI Projects Aren’t Expensive Because of One API Call. They’re Expensive Because of the Full System. The pattern I've seen repeat across 6 or 7 AI project kickoffs this year alone: 1. Team proposes an AI feature 2. Stakeholder asks "what will this cost?" 3. Engineer says "depends on usage, but it's cheap, LLM APIs are pennies per call" 4. Feature ships 5. Invoice arrives The "pennies per call" calculation forgot: embeddings, vector database storage and queries, monitoring, guardrails, infrastructure, and human review. At any real scale, those "pennies" become thousands. I got tired of watching teams discover this at invoice time. So I built something that shows them the number before they commit. What "It's Just API Calls" Misses Real AI systems have more cost components than the LLM API. Embeddings. If you're building RAG, you're embedding every document and every query. text-embedding-3-small is still priced at $0.02 per million tokens, which sounds negligible in isolation. But re-indexing, chunking strategy, retrieval volume, and downstream infra are where teams start to feel the real system cost. Vector database. Managed vector storage is rarely just “set and forget.” Pinecone, Weaviate, Qdrant Cloud, and hosted pgvector all have different cost curves once document volume, throughput, replication, and reliability requirements increase. Monitoring and evals. Production systems need observability. That can mean trace tooling, eval pipelines, retention, alerting, and team seats. Useful, necessary, and often omitted from the first estimate. Guardrails. Safety checks add latency and operational complexity, and depending on your stack they may add cost too. Teams usually notice this only after they move beyond a demo. Human review. The moment a workflow needs QA, approvals, or escalation, AI cost stops being just API cost. It becomes workflow cost. What I Built 6 project templates (chatbot, RAG knowledge base, content generation, code assistant, data analysis, custom). 12 LLM options with current pricing. Embeddings, vector databases, monitoring, guardrails, human review, all configurable. Set your monthly query volume. Set your average token counts. Get an instant cost breakdown with optimization tips: "Switch from GPT-4o to GPT-4o-mini for batch classification, saves \$X at your scale." Email yourself the full report. No account, no data sent to any server. The Number That Changes the Conversation The most useful thing about having a cost estimate before a project starts isn't the number itself. It's what the number does to the conversation. "It's cheap, just API calls" is an answer that ends discussion. "3,200/month at current scale, dropping to 3,200/month at current scale, dropping to 1,100 if we use GPT-4o-mini for batch jobs and cache the embeddings" is an answer that starts engineering decisions. That's the conversation I want clients and juniors to be having. Not after the invoice. Before the commit. Try it: https://t.co/GgyKam0aZT What's the biggest AI cost surprise you've encountered in a project? #AI #CloudCosts #MachineLearning #AITools #ProductEngineering

$Vibbsdod's tweet photo. AI Projects Aren’t Expensive Because of One API Call. They’re Expensive Because of the Full System. The pattern I've seen repeat across 6 or 7 AI project kickoffs this year alone: 1. Team proposes an AI feature 2. Stakeholder asks "what will this cost?" 3. Engineer says "depends on usage, but it's cheap, LLM APIs are pennies per call" 4. Feature ships 5. Invoice arrives The "pennies per call" calculation forgot: embeddings, vector database storage and queries, monitoring, guardrails, infrastructure, and human review. At any real scale, those "pennies" become thousands. I got tired of watching teams discover this at invoice time. So I built something that shows them the number before they commit. What "It's Just API Calls" Misses Real AI systems have more cost components than the LLM API. Embeddings. If you're building RAG, you're embedding every document and every query. text-embedding-3-small is still priced at $0.02 per million tokens, which sounds negligible in isolation. But re-indexing, chunking strategy, retrieval volume, and downstream infra are where teams start to feel the real system cost. Vector database. Managed vector storage is rarely just “set and forget.” Pinecone, Weaviate, Qdrant Cloud, and hosted pgvector all have different cost curves once document volume, throughput, replication, and reliability requirements increase. Monitoring and evals. Production systems need observability. That can mean trace tooling, eval pipelines, retention, alerting, and team seats. Useful, necessary, and often omitted from the first estimate. Guardrails. Safety checks add latency and operational complexity, and depending on your stack they may add cost too. Teams usually notice this only after they move beyond a demo. Human review. The moment a workflow needs QA, approvals, or escalation, AI cost stops being just API cost. It becomes workflow cost. What I Built 6 project templates (chatbot, RAG knowledge base, content generation, code assistant, data analysis, custom). 12 LLM options with current pricing. Embeddings, vector databases, monitoring, guardrails, human review, all configurable. Set your monthly query volume. Set your average token counts. Get an instant cost breakdown with optimization tips: "Switch from GPT-4o to GPT-4o-mini for batch classification, saves \$X at your scale." Email yourself the full report. No account, no data sent to any server. The Number That Changes the Conversation The most useful thing about having a cost estimate before a project starts isn't the number itself. It's what the number does to the conversation. "It's cheap, just API calls" is an answer that ends discussion. "3,200/month at current scale, dropping to 3,200/month at current scale, dropping to 1,100 if we use GPT-4o-mini for batch jobs and cache the embeddings" is an answer that starts engineering decisions. That's the conversation I want clients and juniors to be having. Not after the invoice. Before the commit. Try it: https://t.co/GgyKam0aZT What's the biggest AI cost surprise you've encountered in a project? #AI #CloudCosts #MachineLearning #AITools #ProductEngineering$

Vibbs Dod

@Vibbsdod

4 months ago

The architecture pattern nobody talks about: Cache-Augmented Generation. Small, static corpus (< 5,000 documents, rarely changes)? Skip the vector DB. Load everything into context at startup. Zero retrieval latency. Zero retrieval failure. Simpler system with fewer moving parts. Sometimes the right answer is "less architecture." What's the simplest AI architecture you've shipped that actually worked well in production?

Vibbs Dod

@Vibbsdod

4 months ago

https://t.co/aY9hR8k6Hq

Vibbs Dod

@Vibbsdod

4 months ago

https://t.co/742zF3co3N

Vibbs Dod

@Vibbsdod

4 months ago

@GergelyOrosz It does it feel like after Claude for Government was started the other services have had huge impact!!

10K

Vibbs Dod

@Vibbsdod

4 months ago

@claudeai whats happening buddy ?? Why does it feel like after Claude for Government was started the other services have had huge impact!! #Claude

Vibbsdod's tweet photo. @claudeai whats happening buddy ??

Why does it feel like after Claude for Government was started the other services have had huge impact!!

#Claude https://t.co/aXsksoZT1p

184

Vibbs Dod

@Vibbsdod

4 months ago

@claudeai - what happened? seems like many requests are failing with internal server error.. Hope nobody is hacking you..!!

Vibbs Dod

@Vibbsdod

4 months ago

https://t.co/pxNcwDzOEn

Vibbs Dod

@Vibbsdod

5 months ago

https://t.co/vyugdteMza

Vibbs Dod

@Vibbsdod

10 months ago

I am having a hard time sharing my learnings from enterprise to open-source. Starting today will try to put a dedicated time every week for this. Scaling GenAI solutions at enterprise level is an interesting problem statement.

Vibbs Dod

@Vibbsdod

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users