Someone should build a computer use benchmark for navigating Chinese gov websites. That'd be true god tier AGI bench final-boss-level RL env. I just spent 2 hours filing for an ID card replacement:
- Switched between WeChat, Chrome, Safari, and 360 Browser (a chinese native browser)
- The Beijing municipal gov has a web app, desktop app, and WeChat mini app. I had to bounce between all three to complete different steps.
- 2FA wouldn't send to my US phone number, so I had to enter my mom's number
- Did facial scan ~20 times because it kept saying it couldn't recognize me under nighttime lighting
- Halfway through I had to turn on a Chinese VPN because some page transitions kept failing.
This will humble the shit out of Mythos
I hope this serves as a wake-up call for sovereign nations everywhere, not just China, that it’s time for every nation to invest in and build its own LLM.
Five years ago, it was rare for escorts to charge more than $1K per hour. Now, a handful of women charge much, much more: $3k, $5k an hour. $23k a day. $30k a weekend.
Inside the shifting economics of intimacy in Silicon Valley:
https://t.co/zmPrr2rkCp
Really like the idea of not staying at the data layer forever. That said I'm still cautious about how natural the transition from RLaaS to a vertical application player is. By then, you'll likely have strong RL talents, proprietary harness, and an agent-native alternative to incumbent tools, but selling to vertical enterprises is a very different GTM motion from selling to AI labs
Rather than the useless 3.5, Deep Think is the only reason I kept a paid tier for google ai. Now even that isn't working. @GoogleDeepMind@GoogleAI do you not at least feel embarrassed?