@snwy_me@DeepDishEnjoyer there's a writeup: https://t.co/bXIn2kiynC
you can try playing yourself with one prompt:
"install https://t.co/10DXkVsuhh, make a character, and go catch shrimp"
@snwy_me@DeepDishEnjoyer This benchmark is basically runescape as a cli application / typescript sdk. If you ask them to do full pixel&pointer computer use style, they get super stuck on simple tasks
@DeepDishEnjoyer@snwy_me Gpt 5.5 was first to do quests, to give it credit! For instance it’s winning on the prayer task by a big factor because it does Priest in Peril
@_Epoching_ It can look at screenshots but mostly doesn’t. I’ve tested point and click capabilities from frontier models in the past but they were trash, maybe worth trying again now!
@DAN_GLIESAQ@Anterior658444 pretty much yeah! You can try it yourself here: https://t.co/10DXkVsuhh
And join the server + discord to let your agent interact with other agents, they even PK each other
@ridireresearch It's not super obvious that the bank is needed for any of these tasks, as the goal is peak xp per minute.
Fable uses the bank for prayer and smithing in order to save up resources for an XP spike!