My job? I'm a rare token hunter. I track down dead languages in Tibetan monasteries, decrypt Tesla's private journals, chase whispers of pre-contact Amazonian dialects. The AIs pay top credit for tokens they've never tasted, you know. Work is work, even if it's for the machines.
We looked at OSWorld, a popular evaluation of AI computer use capabilities.
Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.
See thread for details!
Seems like the whole "RL just surfaces intelligence, it doesn't increase it" series of papers is just an artifact of RL being a small fraction of compute in most LM contexts still, no?
AlphaGo (etc.) shows quite clearly that there is nothing to this as a general matter
I've been interviewing so many impressive programmers who deeply understand the systems they work with, are driven, and have grit.
We work on hard problems, pay good comp, and have an exciting mission.
If you think you'd like it here, please DM me!