@cryptocode24@METR_Evals It solves the problem in an unintended way - e.g. when asked to give a 6 word short story, it instead googles famous stories and uses one it found.
@m_bourgon Hey appreciated; agree with the reply / reply-all semantics would be nice to be built in - @steipete can weigh in - but if youโre using an agent using send with reply-to-message-id is not too bad. Or having to specify subject.
@levie Honestly completely disagree - itโs way more likely that we kick the humans out of the loop - why do we need to wait on human triage if the fix is trivial and low risk?
@Altimor Super interesting - the V2 confuses me slightly - did you manage to get the โtool searchโ to ignore the previous context and existing system prompts when making a decision? Do you then roll-back the now extended prefix with the final decision inline
@dqnamo@Vercantez Agent inside the sandbox gives you the most self-modification power. Agent outside the sandbox gives you the most control and easier reasoning about security boundaries.
No idea which is better yet - Iโve done one of each and liked/disliked both.