Browser agents use computers the same way humans do, unlocking powerful use cases for personal assistants, browsers, and enterprise workflows.
After talking to 20+ founders in the space, we're excited to put out the definitive market map for browser agents.
Why Now? (4/4)
AI-first browsers are poised to disrupt the massive web browser market, with highly anticipated releases like Comet from @perplexity_ai on the way. It's yet to be seen how Google integrates Project Mariner and other AI tools within Chrome.
The Theta team started CUB as an internal evalset, but it quickly grew into a full-fledged benchmark over the past month. We're excited to test even more models and frameworks. For more on the benchmark, including examples and a full paper, check out our blog:
https://t.co/VSWaMRdCGT
Computer/browser use agents still have a long way to go for more complex, end-to-end workflows. Actual task completion is far below our reported numbers: we gave credit for partially correct solutions and reaching key checkpoints. In total, there were less than 10 instances across our thousands of runs where an agent successfully completed a full task.
Theta (@trytheta) allows AI agents to learn from their mistakes in real-time. Their memory layer has already improved the accuracy of OpenAI Operator by 43% with 7x fewer steps taken.
https://t.co/9uI9vbSYLs
Congrats on the launch, @RayanGarg, @tsha444, and @_gurvir_!