Building LLM systems felt more like praying to the AI gods than engineering.
But all that changed when I learned about eval-driven development.
This Thursday (9/25) at 11AM CST, I’ll be sitting down with @HamelHusain for a live podcast to discuss evaluation frameworks and failure analysis in LLM systems.
We’ll cover:
- What's an AI eval—and why it matters?
- How to run error analysis for open-ended tasks
- Should you rely on LLM judges or code-based evals?
- How to debug and improve your system once you have evals in place
... and more
Join us live and ask Hamel your AI eval questions!
👉 Register here: https://t.co/TrH6do8AK5
Just watched @ShawhinT YT video on "ML Foundations for AI Engineers".
All the concepts are so beautifully explained with various examples. A must watch.
Building software on top of (unpredictable) LLMs can feel more like praying to the ML gods than engineering 🙏
But it doesn’t have to be that way…
In a recent blog post, I showed how error analysis resolves the guesswork by giving us a systematic way to identify and improve the most significant failures from an LLM system.
👉 Check it out here: https://t.co/2378mE7Tdm
--
Shout out to @HamelHusain and @sh_reya, who showed me the light when it comes to LLM error analysis in their fantastic AI evals course :)
Over the last 2 yrs, I've helped 35+ companies improve their AI products.
I distilled my approach into this guide, which covers error analysis, synthetic data, eval-human alignment, involving domain experts, optimizing # of experiments & more (1/5)
https://t.co/Kn3hyq5zf2
5) Seek sophistication
Data science has endless interesting technical ideas that can applied to endless applications. Exploring these is fun and a big reason why I love data science.
However, as an entrepreneur, I must fight this urge and instead bias toward simple techniques and solutions.
4) You’re done when you deploy
As a data scientist, once the project was built and deployed, I could relax.
As an entrepreneur, this is only the start. After deploying, I need to let people know about it and (hopefully) pay for it.
3) Building (rather than buying) just because I can
Before, I had a tendency to avoid paid software tools/services when I felt I could implement them myself.
Now, I bias toward paying for solutions that can help me move faster.
2) Building for “prod” with 0 users
When working at an established company, ML projects might immediately impact thousands of customers and millions of dollars, so you have to build for scale from the start.
However, as an entrepreneur, you’re more often seeking validation, which means speed and iteration are much more important than scalability.