Dear Frontier Lab,
Please saturate all the following benchmarks by the end of the year:
- ARC-AGI-3
- Humanity's Last Exam
- ENIGMAEVAL
- Remote Labor Index
- SimpleBench
- VideoGameBench
- OSWorld
- FrontierMath
- RE-Bench
Thank you!
@peterwildeford I think people will still believe the unhinged false flag because they already believe in way dumber propaganda like the AI water usage panic
@deepestbrew@peterwildeford The difference is you can scale the deployment of mythos, but human labor is inherently scarce, there’s a reason they didn’t found these vulnerabilities before mythos came around
@chenna1985@spicey_lemonade “untested unreliable products”
It have solved multiple erdos problems, that seems pretty tested to me
“big unethical corporation”
OpenAI is not unethical, you aren’t more ethical than them, I think you associate “big” and “corporation” automatically with evil