Back to building a SaaS 🚀 • Advisor & co-founder @electricbeastco • Founded Paw API testing tool (exit in 2020) • #Biotech#PrecisionFermentation enthusiast
@anakin@MistralAI@OpenAI@AnthropicAI But just much cheaper. I’m using batch API (categorization work), and something that is very poorly documented by @MistralAI yet awesome, is that I get prompt caching discounts on top of the discount for batch inference.
Again, evals allowed me to optimize for input caching.
@anakin I can confirm that it has changed the way I build prompts and test models.
As an example, I realized that @MistralAI Mistral Small model was offering (for my use case) similar performance than @OpenAI 4o-mini (coming from my legacy code) and even @AnthropicAI Haiku.
The QR code industry is full of liars
"free forever" that costs $29/month
"no watermarks" that plaster your codes
"unlimited" with daily limits
interfaces that look like 2003
got tired of it. built the alternative.
no tricks. no ugly UI. no bs.
https://t.co/PNlk7jx7gU
@RnaudBertrand 100%. But to be clear, Sarkozy put France back in NATO’s integrated military command. France always has been in NATO itself, even under De Gaulle.
1/🇫🇷14 ans. Visa valide.
Venue passer des vacances, elle finit enfermée à l'aéroport par décision du @Interieur_Min...qui sera jugée illégale par le juge.
Motif du refus d'entrée : elle a mal répondu aux questions de la PAF… posées sans interprète.
https://t.co/RRU8xp4jnM
So many answers to my post are of the type "we're owning China by banning foreign students to Harvard", "we get nothing for educating foreigners", "with this we'll teach Americans and not our competitors", etc.
It's astonishing to see how ignorant Americans can be with regards to the source of their power. And also, judging by the intellectual caliber of these responses, it demonstrates in itself that if you limit U.S. universities to American students only, the standards will crater.
I hate to break it to you but this move will not even remotely "own China". Quite the contrary this is one of the biggest self-own in American history.
The whole challenge of competing with the U.S. is that you're not only competing with Americans but with the collective brainpower of 140+ countries concentrated in American institutions. Including, as is often the case, competing with your own country's brightest minds who joined the "opposite camp" as it were.
Furthermore, even if they go back home after their studies, these students often become part of their countries' elites and maintain lifelong networks, friendships, and cultural affinities with America, ensuring that when they're making decisions as CEOs, ministers, or judges, they have an instinctive understanding of and sympathy for the U.S.'s perspective.
Want to throw all that away? Be my guest. But I'll end this post with the concluding sentence of historian Arnold Toynbee in his 12-volume "A Study of History" where he writes: "Civilizations are not murdered. Instead, they take their own lives."
Toute la haine déversée sur nos compatriotes musulmans me ramène à celle déversée sur mes aïeux. Je voudrais leur exprimer toute ma solidarité. Notre fraternité sera notre salut.
Les héritiers de Drumond et Maurras sont nos ennemis communs.
People who don't use AI to code, do research, summarize documents, and even produce content in 2025 are like people who refused to use the internet in 2000. It's a self-inflicted disadvantage.
Batch processing by @GroqInc makes so much sense. I run LLM completions from a queue service, which anyways takes days to process. It just makes sense that inference providers take the whole batch and process it when they have more available resources.
@jeffr_yyy Hey Jeffrey! Confident AI & DeepEval look awesome, that's exactly what I've been looking for.
Quick questions:
- Are you supporting multimodal (vision)? In prompts and dataset?
- It would be awesome if Human Feedback would support custom React components to display
Thx!
This is exactly what I was looking for 👏
We truly live in a different world. I'm building an AI product, realize I need a tool to test the results of the LLMs, I use @perplexity_ai to find the right answer, it directly me to @confident_ai
🚨 95% of LLM evaluations fail to deliver value—why? 🤔
Because most teams are unknowingly evaluating the wrong thing.
Typical LLM metrics sound great:
- Correctness: "Did the model get the facts right?" ✅
- Answer Relevancy: "Did it directly answer the question?" 🎯
- Faithfulness: "Did it avoid hallucinations?" 🔎
- Tonality: "Did it match the desired voice?" 🗣️
But here's the issue:
Your LLM doesn't exist simply to be correct, relevant, or faithful. It exists to deliver ROI—reducing customer support costs, saving analyst hours, or increasing customer satisfaction. 📈💸🤑🤑
Metrics must correlate to real-world outcomes. Your test case passing rates should confidently predict tangible business impact—more ticket resolutions, reduced internal workload, increased efficiency.
When you build this metric-to-outcome connection, evaluations finally mean something. Improvements in your LLM’s performance become improvements in your business metrics.
How do you do this right (@confident_ai )?
👉 Humans-in-the-loop, with metric alignment.
- Curate just 25–50 human-labeled "good" or "bad" real-world OUTCOMES.
- These aren't metric scores—these are OUTCOMES like support tickets being resolved or closing your LLM app in frustration. Whatever your product KPI is, you know it better than me.
- Figure out the set of metrics would produce a test result-outcome correlation through trial and error.
Forget synthetic data. Forget vanity metrics. If your evaluation data doesn't represent real users and real outcomes, you're evaluating in the dark. 🕶️
We unpacked exactly how to establish these connections clearly, practically, and repeatedly in our latest guide:
👉 The Ultimate LLM Evaluation Playbook:https://t.co/ijAEbXdJqO
OpenAI announced a partnership with Estonia to roll out ChatGPT Edu in all secondary schools, starting with 10th and 11th graders by September 2025, as part of the country’s AI Leap 2025 initiative to provide free AI tools and teacher training