LLMs are as powerful as their tools and environments.
For computer use agents, most of the agents today try to predict the coordinates from screenshots and iterate based on that.
There is a much better approach that I wonder why nobody is doing.
It is accessibility trees. It is very hard to capture the full tree of an app if its underlying assembly doesn't expose it all. Especially for electron apps because it is all embedded under a <div>
That is why I built agent-desktop. A desktop automation CLI built using Rust. Extremely fast. Full snapshot in less than 100ms.
It has 54+ commands and is fully GA for Mac. Windows and Linux launching soon.
Here's a video showcasing my @openclaw agent controlling my Mac.
@berenddeboer@giuseppegurgone I've set it up in zsh so anytime when a command is run it automatically wraps with sfw and the agent doesn't even know. It just runs bun or npm install but it gets rewritten under the hood
@hthieblot Been freelancing. Never had a 9-5 (actually didn't get any). Been a founder for 2 years. Made money but barely less than a 9-5er. I can relate to this. Still pushing through. Don't know if it's bravery or anything else. Humble enough to be happy for the provision.
LLMs are as powerful as their tools and environments.
For computer use agents, most of the agents today try to predict the coordinates from screenshots and iterate based on that.
There is a much better approach that I wonder why nobody is doing.
It is accessibility trees. It is very hard to capture the full tree of an app if its underlying assembly doesn't expose it all. Especially for electron apps because it is all embedded under a <div>
That is why I built agent-desktop. A desktop automation CLI built using Rust. Extremely fast. Full snapshot in less than 100ms.
It has 54+ commands and is fully GA for Mac. Windows and Linux launching soon.
Here's a video showcasing my @openclaw agent controlling my Mac.
@theo Exactly it sitll follows the screenshot approach. I'm building an accessibility tree approach which operates apps with 100% accuracy, check it out!!
https://t.co/EQRWFDzuSu
@TukiFromKL It still follows the screenshot approach which loses accuracy. I'm building an accessibility tree approach which operates apps with 100% accuracy, check it out!!
https://t.co/EQRWFDzuSu
@Rasmic It still follows the screenshot approach which loses accuracy. I'm building an accessibility tree approach which operates apps with 100% accuracy, check it out!!
https://t.co/EQRWFDzuSu
@powerhdeleon Exactly, it still follows the screenshot approach which loses accuracy. I'm building an accessibility tree approach which operates apps with 100% accuracy, check it out!!
https://t.co/EQRWFDzuSu