Here's the latest from @OpenAdaptAI, faster and more robust. No command line required! #AI#agent#OpenAI#GPT4
Free download at https://t.co/cDF7XTzTBc 🚀
We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data.
🔗 [Paper] https://t.co/SYEio5ccNJ
📌 [Website] https://t.co/ma6bBuYiNM
🤖 [Models] https://t.co/7TVtIdjkmq
📊[Data] https://t.co/N6tQQwQkhs
💻 [Code] https://t.co/ihr8TXmG6k
🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including:
📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories)
🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA)
🖥 AgentNetTool — cross-system computer-use task annotation tool
🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation
💡 Why OpenCUA?
Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding.
Details of OpenCUA framework👇
🙌 Acknowledgement: We thank @ysu_nlp, @CaimingXiong , and the anonymous reviewers for their insightful discussions and valuable feedback. We are grateful to Moonshot AI for providing training infrastructure and annotated data. We also sincerely appreciate Jin Zhang, Hao Yang, Zhengtao Wang, and Yanxu Chen from the Kimi Team for their strong infrastructure support and helpful guidance. The development of our tool is based on the open-source projects DuckTrack @arankomatsuzaki and @OpenAdaptAI we are very grateful for their commitment to the open-source community.
Finally, we extend our deepest thanks to all annotators for their tremendous effort and contributions to this project. ❤️
Anybody looking for a GUI+ICL-->MCP library should definitely check out OmniMCP which puts Microsoft's Omniparser to use in generating GUI tool use APIs. Early days but pretty neat
https://t.co/iJkRVDO57B
I prompted @openai's ChatGPT o3-mini-high and @DeepSeek's R1 to implement code to for deploying @alibaba_qwen's Qwen2.5-VL.
Both agree that R1's implementation is "more comprehensive" and better "for production systems".
Qwen2.5-VL is the first open source multimodal model that appears to be able to accurately generate bounding box coordinates 🚀
Thank you @Alibaba_Qwen ! Excited to integrate this in @OpenAdaptAI
https://t.co/XJwVgm991i
Check out our latest GUI Agent -> UI-TARS 🥳
A vision-language model surpasses GPT-4o & Claude Computer-Use
Paper, code, model ckpt, desktop APP are now open-sourced~
https://t.co/7umVHrnMds
https://t.co/f4973AmmQh
https://t.co/EXFqSIvRCg
> DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
https://t.co/Vgnovq9hsD
We can run frontier models at home now.
Another day, another breakthrough:
Apply DCT to convert actions into frequency components, quantize them prioritizing low frequencies, then use autoregressive prediction in frequency order (low to high) to generate actions.
From @physical_int. May generalize to @OpenAdaptAI.
@hwchase17 With @OpenAdaptAI you start and stop recording demonstrations of repetitive tasks via the tray icon. Show, don't tell. Perform, don't prompt.