@svpino I don’t think you understand how harness works. Claude and codex, etc post train on tools call, this just a generic harness , just use deep agents
@LLMJunky Yeah I created a post skill feedback skill (mouthful) I attached this to all skills at the end to have it reflect on itself an any assumptions that it has to take because the skill lacked the right context and save the feedback to a table for me to review ,
@VladBarash yeah that is exactly it. The hardest part is how to do you come up with the rubric. You have to ask yourself "what is your goal" if its agent optimization with objective measures ie a agentic benchmark eval but just for your metrics. I would use autoresearch