we formalized this further in long horizon research tasks, specifically where there are multiple goodness of fit metrics that need to be considered and optimized for.
one thing we realized is that the loop is most valuable at the accept/reject boundary.
a global aggregate score can improve while the result moves in the wrong local direction, so the external loop has decide if the agent did actually find a better solution, or did it just find a local tradeoff that makes the headline metric look better?
the paper explore's this on a mechanistic ecology model used in NASA’s Carbon Monitoring System.
https://t.co/B0jjONCvMt