what if your AI agent could call a human as a tool?
wired browser-handoff into a @browser_use agent doing a shopping checkout. when it hits the login wall or card form, it raises its hand. discord ping, I take over, agent resumes.
what if your AI agent could call a human as a tool?
wired browser-handoff into a @browser_use agent doing a shopping checkout. when it hits the login wall or card form, it raises its hand. discord ping, I take over, agent resumes.
what if your AI agent could call a human as a tool?
wired browser-handoff into a @browser_use agent doing a shopping checkout. when it hits the login wall or card form, it raises its hand. discord ping, I take over, agent resumes.
the agent decides when to ask — not the library. it has a `request_human_help(reason, done_when)` tool. when it hits something it shouldn't do (credentials, card), it raises its hand and waits.
I typed nothing outside the login + card form. agent did the rest.
A lot of people have been asking about our harness / approach - some thoughts:
1/ it’s fully open source on github!
2/ it is quite simple - and we think this is where harness engineering is heading. you no longer need elaborate scaffolding to force the model to reason in a prescribed way
3/ we initially included a verifier to check the executor’s work. it ended up being *more* accurate than the benchmark’s grader, but omitted it (you can't score above the ceiling set by the grader). we have a lot more to say on this.
4/ we were most excited by the performance uplift in sonnet (lighter model). it reflects a shift toward picking the model at the intelligence/cost pareto max for a task, not just the largest one. sonnet achieved near parity with opus in performance, while costing less than half.
@kylejeong@jamesmurdza@daytonaio@browserbase instead of a manual hand-back button, it auto-detects completion via URL, element, content, or even LLM-based conditions - the agent resumes itself 👀
I made browser-handoff, a tool that lets humans temporarily take over AI browser agents when human input is needed.
Built it while helping @jamesmurdza ship https://t.co/IFndgSwctr: sandboxed agents needed a way to log into Claude.
Demo runs in @daytonaio 👇
@kylejeong@jamesmurdza@daytonaio@browserbase tried it - too laggy to complete a login, inputs barely worked 😅. Director is a product built around browserbase’s stream, browser-handoff is a library you attach to any playwright session
@kylejeong@jamesmurdza@daytonaio@browserbase Cool! Though from what I can tell, Live View is the stream URL — browser-handoff handles the layer above: trigger detection, pause, notify, wait for completion, then resume. Couldn’t find that @browserbase does that natively — happy to be corrected if I’m missing something 👀
X is cool.
but it’s 100x better when your timeline is full with people who code and build things.
if you’re into tech, AI, startups, product, design, development or programming.
say hi 👋
I made an open source version of Claude Code for web. And it’s way better:
Six different agents
Upload and view files
Fully featured terminals
Nested subagents
Easily rebase and merge
It's open source
Every agent and branch runs in a @Daytonaio sandbox.