At #AgentCon today.
Building AI agents is becoming easy.
Building reliable AI agents is becoming the challenge.
Skills, memory, evaluation, observability, and governance are quickly becoming essential parts of the stack.
USB-C standardized how devices connect.
MCP may do the same thing for AI agents.
The real shift isn’t the protocol.
It’s standardized AI-to-tool interaction.
https://t.co/hJXqyifC6B
Your Playwright tests aren’t flaky.
They’re context-starved.
Debugging fails because we don’t reconstruct what actually happened.
I wrote about building a Context Engine that fixes this 👇
https://t.co/76XUZOFp3D
1/
Your LLM isn’t “random.”
It’s changing in ways your tests will never catch.
Hallucinations, drift, cost spikes, slowdowns —
all happen after deployment.
Most teams never see it coming.
4/
So I built a small, open-source LLM Observability Starter Kit + wrote a clear, beginner-friendly breakdown of how to actually monitor AI behavior.
If you’re QA, SDET, or building with AI —
this is the missing layer.
🔍 Blog:
https://t.co/HbrodSCzsD
LLM Evaluation is Broken - Here’s How to Fix It
1/
Most teams testing LLMs are still doing:
❌ manual prompts
❌ random test cases
❌ “looks good to me”
LLMs don’t work like normal software.
They need new testing methods.
6/
If you’re a QA engineer, SDET, or AI dev —
LLM evaluation is the most important skill of 2025.
I wrote a full deep-dive blog covering everything:
👉 https://t.co/v4U94LzKFS