Episode 4 of ClaudeCast is out!
It takes us back to Melee Island™, where Claude attempts to bribe bridge trolls and fight pirates, to rank high on https://t.co/1zEhShwcoO (the ultimate AI test suite by @simonradner)
https://t.co/wMCXzr5dox
https://t.co/ACrdoOx2HG is live
your agent can now play SCUMM adventure games in the browser
(pre-baked Monkey Island Demo Version or BYOG)
constraint: SCUMM engine only
https://t.co/Iyy83PEjCy
started turning this into a benchmark, the constraint: independent of the specific game, only the scummvm engine.
defining “progress” under that constraint is hard. current hack: measure what changes. not sure if that’s progress or just exploration.
https://t.co/vcmaAJXrYw
@alkampfer Starting prompt is basically “read the instructions and make progress in the game” — the instructions expose a JS API to get game state + logs
will share more soon
@Estudio528 it’s actually trying to find the Scumm Bar, in this run it expects it somewhere in the village and only finds it later
about half the runs it goes there straight away, it’s probabilistic
@damageboy haha I think you’re safe — Monkey Island needs creative, humorous thinking to solve, and the agent mostly lacks that
might brute force it eventually though
@RMWinslow haha
Monkey Island is basically full of prompt injections and you can see the model fall for some of them too
got a bit lucky here, getting into the kitchen needs timing with the cook serving groks
in another run it reduced the sleep between retries to catch the right moment