This auto-eval loop is interesting.
For my own agents the key takeaways for me are:
1. have clear evals
No eval → no improvement loop.
Define:
* completeness score
* correctness checks (did tool calls succeed)
* structure validation (output matches schema)
* decision usefulness (business logic)
2. Log full traces (not just outputs)
You want: tool calls,.decisions, failures
dead ends. Not for debugging
for learning and iteration. I.e. want:
* tool call inputs/outputs
* reasoning steps (even partial)
* failures (timeouts, bad data, wrong tool)
* retries / dead ends
Make The Cut and Win Free Fantasy Group Pool
Compete in the Players Championship fantasy golf pool on EyeOnMajors -- If you just make the cut, you earn a free major with EyeOnMajors pool for the Masters. Give it a try! https://t.co/MuHsqfTgoV (1/5)
Arnold Palmer’s driver off the deck at Bay Hill
https://t.co/H4uvnk7IDt
"When Arnold Palmer famously hit a driver off the deck, he was 74 years old and the shot occurred at the 2004 Arnold Palmer Invitational; the exact score for that day is not widely reported, but the highlight is remembered for his daring decision to use a driver from a difficult lie on the 18th hole, showcasing his aggressive playing style even at an advanced age."
Too much #AI? Automation can lead to human error? That's impossible. @RbKeefer has really stepped in it this time! Be sure to check him out at @CodePaLOUsa. https://t.co/fle4TpfAnC https://t.co/FmhVF2gjwy
Join @Pomiet at #CodePalousa, March 28-30, in Louisville for "Goldilocks + #AI", led by @RbKeefer. Explore impacts of too much or too little #ArtificialIntelligence and how to find a fit that's "just right". https://t.co/6Ux847pEfW
Be sure to check out a new podcast from my friends at @EMI_Research. Great market research industry insights plus a few other fun and engaging topics. https://t.co/wb0gefxs8b
"Perhaps the greatest opportunity for improving our professional satisfaction in the short term lies in restoring our connections with one another." #Healthcare#doctor https://t.co/TDvJOO1KvM
As we wrap-up 2017, it's exciting to reflect on achievements. Thanks for the list, @Redox. In 2018, let's add categories for #EaseofUse and #PhysicianHappiness. Our nomination is: #heathii. Cheers! https://t.co/ip2Y6Gd54Y
Put your geeky hat on and read about Col. Boyd's OODA Loop. I dare you to apply it to something in your world. Perhaps that 2 hour meeting you have tomorrow. Or your team's strategy to deliver. Or your business's goal to win! https://t.co/H64s8ZcSZC