Xinyuan Wang @xywang626 - Twitter Profile

27 days ago

Congrats @BowenWangNLP on the great CUA-Gym work! It’s very meaningful for the CUA community to have such a large-scale set of verifiable tasks.

Bowen Wang

@BowenWangNLP

27 days ago

RLVR has become the recipe for agentic post-training. But for Computer-Use Agents, the bottleneck is not the algorithm, it is the data. 🐌 🚀 We introduce CUA-Gym: a scalable, lightweight synthesis engine that turns arbitrary task queries into verifiable RLVR data for computer-use agents. The largest open CUA RLVR dataset to date: 🎯 32,122 verifiable RLVR tasks with programmatic setup scripts + rewards 🌐 110 environments: 16 desktop apps + 94 synthesized mock web apps 🏆 Qwen3.5-based CUA models trained with GSPO reach 72.6% on OSWorld-Verified and 56.6% on WebArena 📄 Paper: https://t.co/cdvHJPzgb1 🏠 Homepage: https://t.co/kvhaOQxNx7 🤗 Dataset: https://t.co/w5vOIRdchR 💻 Codebase: https://t.co/CcRlNTlS1c 🧩 Environments: https://t.co/fNZ6YAI8LD 🧵[1/6]

18

511

94

564

99K

1

6

0

3

744

Xinyuan Wang @xywang626

about 2 months ago

@yihengxu_ Congrats, Yiheng!

0

1

0

44

xywang626 retweeted

Yu Su

@ysu_nlp

2 months ago

Introducing @NeoCognition, the agent lab for specialized intelligence. Everyone needs experts, but human expertise does not scale. Backed by $40M seed funding, we build self-learning agents that specialize across domains to make expertise abundant.

92

885

133

365

190K

Xinyuan Wang @xywang626

2 months ago

@ysu_nlp @MingZhong_ @NeoCognition Congrats! Looking forward to the new form of agents!

1

4

0

185

xywang626 retweeted

Shibo Hao

@Ber18791531

2 months ago

🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench

2

79

34

19

12K

xywang626 retweeted

Cyandev

@cyandev

4 months ago

春晚无聊到我开始捣鼓老家的电脑，甚至写了一个 chatbot，有点中式梦核了

88

3K

247

546

593K

Xinyuan Wang @xywang626

5 months ago

K2.5 also achieves #1 on the OSWorld leaderboard — the best open agentic model! Happy to be part of the team! Since OpenCUA was released last August, we’ve seen rapid leaps in agentic foundation models. Exciting to see open agentic models now reaching and even leading.

xywang626's tweet photo. K2.5 also achieves #1 on the OSWorld leaderboard — the best open agentic model! Happy to be part of the team! Since OpenCUA was released last August, we’ve seen rapid leaps in agentic foundation models. Exciting to see open agentic models now reaching and even leading. https://t.co/UN8aZm3Mck

Kimi.ai @Kimi_Moonshot

5 months ago

Kimi K2.5 tech report just dropped! Quick hits: - Joint text–vision training: pretrained with 15T vision-text tokens, zero-vision SFT (text-only) to activate visual reasoning - Agent Swarm + PARL: dynamically orchestrated parallel sub-agents, up to 4.5× lower latency, 78.4% on BrowseComp - MoonViT-3D: a unified image–video encoder with 4× temporal compression, enabling 4× longer videos in the same context - Toggle: token-efficient RL, 25–30% fewer tokens with no accuracy drop Here's our work toward scalable, real-world agentic intelligence. More details in the report 👉https://t.co/N5pwm0M4jm

Kimi_Moonshot's tweet photo. Kimi K2.5 tech report just dropped!

Quick hits:
- Joint text–vision training: pretrained with 15T vision-text tokens, zero-vision SFT (text-only) to activate visual reasoning
- Agent Swarm + PARL: dynamically orchestrated parallel sub-agents, up to 4.5× lower latency, 78.4% on BrowseComp
- MoonViT-3D: a unified image–video encoder with 4× temporal compression, enabling 4× longer videos in the same context
- Toggle: token-efficient RL, 25–30% fewer tokens with no accuracy drop

Here's our work toward scalable, real-world agentic intelligence. More details in the report 👉https://t.co/N5pwm0M4jm

53

2K

278

669

315K

1

10

1

0

976

Xinyuan Wang @xywang626

5 months ago

Great agentic model!

Kimi.ai @Kimi_Moonshot

5 months ago

🥝Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝K2.5 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode. 🥝K2.5 Agent Swarm in beta for high-tier users. 🥝For production-grade coding, you can pair K2.5 with Kimi Code: https://t.co/A5WQozJF3s - 🔗 API: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/6h2KkoA0xd 🔗 Weights & code: https://t.co/H38KegeDIY

Kimi_Moonshot's tweet photo. 🥝Meet Kimi K2.5, Open-Source Visual Agentic Intelligence.

🔹Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%)
🔹Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%)
🔹Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion.
🔹Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup.
-
🥝K2.5 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode.
🥝K2.5 Agent Swarm in beta for high-tier users.
🥝For production-grade coding, you can pair K2.5 with Kimi Code: https://t.co/A5WQozJF3s
-
🔗 API: https://t.co/EOZkbOwCN4
🔗 Tech blog: https://t.co/6h2KkoA0xd
🔗 Weights & code: https://t.co/H38KegeDIY

24

205

41

25

51K

0

1

0

186

Xinyuan Wang @xywang626

5 months ago

Great RL work!

Zhoujun (Jorge) Cheng

@ChengZhoujun

5 months ago

Pretraining has scaling laws to guide compute allocation. But for RL on LLMs, we lack a practical guide on how to spend compute wisely. We show the optimal compute allocation in LLM RL scales predictably. ↓ Key takeaways below

18

440

98

359

71K

1

10

2

713

xywang626 retweeted

Zhoujun (Jorge) Cheng

@ChengZhoujun

5 months ago

Pretraining has scaling laws to guide compute allocation. But for RL on LLMs, we lack a practical guide on how to spend compute wisely. We show the optimal compute allocation in LLM RL scales predictably. ↓ Key takeaways below

18

440

98

359

71K

xywang626 retweeted

DINQ

@dinq_me

5 months ago

You’ve done real work. But most of it is hard to see. DINQ brings your projects, code, and research onto one card. No self-promotion. Just real signals. Build your DINQ → https://t.co/IaEtN0Vkab #DINQ

38

212

61

106

136K

Xinyuan Wang @xywang626

7 months ago

@kugwzk1 @NeurIPSConf @Alibaba_Qwen @JustinLin610 @Qiuzihanhan Congrats!

0

1

0

133

Xinyuan Wang @xywang626

7 months ago

@MinChonChiSF Thank you!

0

12

Xinyuan Wang @xywang626

7 months ago

Excited that OpenCUA was accepted as a NeurIPS Spotlight! Our poster will be presented in San Diego Poster Session 3 on Thu Dec 4, 11:00 PST. I’ll be in San Diego from Dec 3–5 — happy to chat if you’re around 😀

Xinyuan Wang @xywang626

10 months ago

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] https://t.co/SYEio5ccNJ 📌 [Website] https://t.co/ma6bBuYiNM 🤖 [Models] https://t.co/7TVtIdjkmq 📊[Data] https://t.co/N6tQQwQkhs 💻 [Code] https://t.co/ihr8TXmG6k 🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including: 📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories) 🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA) 🖥 AgentNetTool — cross-system computer-use task annotation tool 🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation 💡 Why OpenCUA? Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding. Details of OpenCUA framework👇

xywang626's tweet photo. We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data.

🔗 [Paper] https://t.co/SYEio5ccNJ
📌 [Website] https://t.co/ma6bBuYiNM
🤖 [Models] https://t.co/7TVtIdjkmq
📊[Data] https://t.co/N6tQQwQkhs
💻 [Code] https://t.co/ihr8TXmG6k

🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including:
📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories)
🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA)
🖥 AgentNetTool — cross-system computer-use task annotation tool
🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation

💡 Why OpenCUA?
Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding.

Details of OpenCUA framework👇

14

464

103

253

165K

1

26

6

2

3K

Xinyuan Wang @xywang626

7 months ago

NotebookLM is just so good!

Xinyuan Wang @xywang626

10 months ago

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] https://t.co/SYEio5ccNJ 📌 [Website] https://t.co/ma6bBuYiNM 🤖 [Models] https://t.co/7TVtIdjkmq 📊[Data] https://t.co/N6tQQwQkhs 💻 [Code] https://t.co/ihr8TXmG6k 🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including: 📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories) 🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA) 🖥 AgentNetTool — cross-system computer-use task annotation tool 🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation 💡 Why OpenCUA? Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding. Details of OpenCUA framework👇

14

464

103

253

165K

0

10

0

430

xywang626 retweeted

Xinyuan Wang @xywang626

10 months ago

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] https://t.co/SYEio5ccNJ 📌 [Website] https://t.co/ma6bBuYiNM 🤖 [Models] https://t.co/7TVtIdjkmq 📊[Data] https://t.co/N6tQQwQkhs 💻 [Code] https://t.co/ihr8TXmG6k 🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including: 📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories) 🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA) 🖥 AgentNetTool — cross-system computer-use task annotation tool 🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation 💡 Why OpenCUA? Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding. Details of OpenCUA framework👇

14

464

103

253

165K

xywang626 retweeted

Zhiting Hu

@ZhitingHu

7 months ago

🔥Really excited to see the release of PAN world model, a project I had been working over the past years. PAN is a general world model capable of simulating physical, agentic, and nested worlds, synthesizing infinite interactive experiences for training AI agents. Building on top of pretrained LLMs and video diffusion models, PAN connects language, perception, action, and latent thoughts, for long-horizon simulation and reasoning. PAN shows overwhelming performance gains over JEPA-2, Cosmos-2, and other prior models. More in the thread👇 ... 1/

8

239

53

121

31K

Xinyuan Wang @xywang626

7 months ago

The model performance is promising, especially on UI-Vision, and the dataset will be very helpful to the community.😀

Aarash Feizi @aarashfeizi

7 months ago

🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread

5

89

30

34

23K

0

5

3

0

583

xywang626 retweeted

Kimi.ai @Kimi_Moonshot

8 months ago

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns. K2 Thinking is now live on https://t.co/YutVbwktG0 in chat mode, with full agentic mode coming soon. It is also accessible via API. 🔌 API is live: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/n7xxaszqzF 🔗 Weights & code: https://t.co/4ukcXB0iP6

Kimi_Moonshot's tweet photo. 🚀 Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)
🔹 Executes up to 200 – 300 sequential tool calls without human interference
🔹 Excels in reasoning, agentic search, and coding
🔹 256K context window

Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns.

K2 Thinking is now live on https://t.co/YutVbwktG0 in chat mode, with full agentic mode coming soon. It is also accessible via API.

🔌 API is live: https://t.co/EOZkbOwCN4
🔗 Tech blog: https://t.co/n7xxaszqzF
🔗 Weights & code: https://t.co/4ukcXB0iP6

573

10K

1K

4K

5M

Xinyuan Wang @xywang626

8 months ago

Guess what, unlabeled Youtube videos can be transformed into Computer Use Agent training data🤩

Dunjie Lu

@DunjieLu1219

8 months ago

📣Introducing VideoAgentTrek: a human-free, web-scale pipeline that turns screen-recorded tutorials into training data for computer-use agents, powered by specially trained VLMs. 🔗 [Website] https://t.co/rxTDwNxgtw 📄 [Paper] https://t.co/SVgjCGUWhF

DunjieLu1219's tweet photo. 📣Introducing VideoAgentTrek: a human-free, web-scale pipeline that turns screen-recorded tutorials into training data for computer-use agents, powered by specially trained VLMs.
🔗 [Website] https://t.co/rxTDwNxgtw
📄 [Paper] https://t.co/SVgjCGUWhF https://t.co/cD49OjxkEC

5

153

36

83

50K

0

5

1

0

306

Xinyuan Wang

@xywang626

Last Seen Users on Sotwe

Trends for you

Most Popular Users