co-driving RL for Navigator - our SOTA browser use agent - has been a professional highlight for me, have learned so much working on it with @CysuXiao. read the blog post (linked below) for more!
Introducing Yutori Navigator
31 years ago, the modern web era began with Netscape Navigator.
Today, we’re introducing Yutori Navigator — a web agent that autonomously navigates websites on its own cloud browser to complete tasks for you.
Navigator achieves pareto-domination over Gemini 2.5, Claude 4.5, and OpenAI Operator
• 10%-20% accuracy gains across benchmarks
• 2-3x faster
• Uniformly preferred in head-to-head human-evals
Simply put, the best web agent in the world
We gave some of our partners early access to n1.5 — the most capable computer use model for the web.
It is in production at FAANG scale as we speak, replacing a computer use model from a frontier lab.
If your product can benefit from web automation — extracting structured data from dynamic webpages, filling forms, completing workflows on the web, testing vibe coded web apps — you should try out @yutori_ai's Navigator n1.5!
Save your GPT / Claude / Gemini capacity for something else :)
𝐈𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐢𝐧𝐠 𝐍𝐚𝐯𝐢𝐠𝐚𝐭𝐨𝐫 𝐧𝟏.𝟓
The most capable computer-use model for the web.
Pareto-domination: accuracy, latency, cost
• SoTA across all benchmarks
• +5-10% over GPT 5.5, Opus 4.7, n1
• +25% over Gemini
• 2x faster, significantly cheaper
Expanded action space
• UI actions (like n1)
+ JavaScript generation & execution
I don’t like Pulse in ChatGPT Pro. Its inaccurate, serves a lot of older information and is mostly slop. Which is weird, given how good ChatGPT is w/ search.
Scouts by Yutori (CUA startup) does those things better, imo.
Two updates from Yutori:
1. We benchmarked GPT 5.4 on browser-use tasks
• Matches/slightly-outperforms Opus 4.6 (+0.3%)
• Big jump over previous OpenAI CUAs
2. Latest version of n1
• Outperforms GPT 5.4 and Opus 4.6 (+3%)
• 2.5x faster, 4-5x cheaper.
@_xjdr great post! how much of this would apply to post training? I assume flops section still applies, but router balancing/stability may already be alleviated so not as many tricks would be needed? and the data section isn’t specific to MoEs?
co-driving RL for Navigator - our SOTA browser use agent - has been a professional highlight for me, have learned so much working on it with @CysuXiao. read the blog post (linked below) for more!
Introducing Yutori Navigator
31 years ago, the modern web era began with Netscape Navigator.
Today, we’re introducing Yutori Navigator — a web agent that autonomously navigates websites on its own cloud browser to complete tasks for you.
Navigator achieves pareto-domination over Gemini 2.5, Claude 4.5, and OpenAI Operator
• 10%-20% accuracy gains across benchmarks
• 2-3x faster
• Uniformly preferred in head-to-head human-evals
Simply put, the best web agent in the world
@finbarrtimbers@natolambert another place this comes up is when it *always* uses .get on dicts instead of raw indexing. don’t think it learned that sometimes raising an error is good because it means something is wrong lol
there’s a palpable tension in the air as hundreds of AI researchers (including me!) quietly work nights and weekends trying to figure out the “right way” to scale RL
math & code are not the universe
we will not rest until post-training is as clean and elegant as pre-training
@YiTayML agreed! (except blame being on you :))
we were definitely too grounded in the pre train-finetune and transfer learning paradigm, which was apparent when flan (and instructgpt) came out.
though personally it was a great learning experience for me so early in my career :)
honestly ai is so easy and neural networks are so simple. this was always going to happen to the first intelligent species to come to our planet. we’re about to learn something important about how universes tend to go I think, because I don’t believe we’re in a niche one
@teortaxesTex I hate it too but most of it (besides fraud) is harmless cringe. imo the grifter attitude comes from intense career ambition to make money (good), and shady shortcut-taking mentality being rewarded. Ive seen both in India, which people should remember is still a very poor country
What a crazy week for AI:
- OpenAI launches SearchGPT
- Meta releases Llama 3.1
- Mistral AI releases Mistral Large 2
- DeepMind AI gets silver medal at Int. Math Olympiad
- Elon announces push to Grok 2&3
Competition is intensifying. The months ahead will be super exciting 🤯