Qianhui Wu @5000hui - Twitter Profile

5000hui retweeted

Zhaoyang Wang

@zhaoywang_CS

8 days ago

We’re calling for more reviews now! Join us to build high quality GUI web agent environments! 🤩

3

4

1

0

270

Qianhui Wu

@5000hui

27 days ago

🌳Excited to introduce Orchard! 🚀 🛠️ Orchard-SWE: 67.5% on SWE-bench Verified (30B-A3B, ~3B active) 🖥️ Orchard-GUI: 68.4% avg on WebVoyager / Online-Mind2Web / DeepShop (4B!) 📬 Orchard-Claw: 73.9% pass@3 on Claw-Eval

Wenlin Yao @YaoWenlin

27 days ago

🌳 Introducing Orchard — an open-source agentic modeling framework! 🎉 One thin & cheap sandbox infra powers training recipes across SWE / GUI / personal-assistant agents: ⚙️ Orchard Env: 0.28s exec latency; 100% success @ 1,000 parallel sandboxes 💪 🛠️ Orchard-SWE: 67.5% on SWE-bench Verified (30B-A3B, ~3B active) 🖥️ Orchard-GUI: 68.4% avg on WebVoyager / Online-Mind2Web / DeepShop (4B!) 📬 Orchard-Claw: 73.9% pass@3 on Claw-Eval 🔗 https://t.co/CVZCjANdwz 📦 Code and data are coming soon! Let's accelerate open agentic AI! 🚀

YaoWenlin's tweet photo. 🌳 Introducing Orchard — an open-source agentic modeling framework! 🎉
One thin & cheap sandbox infra powers training recipes across SWE / GUI / personal-assistant agents:

⚙️ Orchard Env: 0.28s exec latency; 100% success @ 1,000 parallel sandboxes 💪
🛠️ Orchard-SWE: 67.5% on SWE-bench Verified (30B-A3B, ~3B active)
🖥️ Orchard-GUI: 68.4% avg on WebVoyager / Online-Mind2Web / DeepShop (4B!)
📬 Orchard-Claw: 73.9% pass@3 on Claw-Eval

🔗 https://t.co/CVZCjANdwz
📦 Code and data are coming soon!

Let's accelerate open agentic AI! 🚀

1

135

22

124

23K

0

26

3

6

3K

Qianhui Wu

@5000hui

about 1 month ago

Excited to introduce WebHarbor! 🌟 ⛑️Mirrors real sites into local Docker environments that are stable and RL-ready. ✅Ship with all 15 WebVoyager sites for reproducible and CAPTCHA-free evaluation. 📢Scaling to 100+ sites next, call for contributors!

Zhaoyang Wang

@zhaoywang_CS

about 1 month ago

Introducing WebHarbor ⚓ — an open community effort to dock real websites into local, deterministic, and evolving environments for web agent research. 🌐 Come help us build it. 🤝 Contribute new web environments or fix existing ones — will be included in the author list! ✍️ 🎉 First release: 15 multimodal, high-fidelity environments covering all 643 WebVoyager tasks — full frontend, backend, database, and auth, all in one lightweight Docker image. Why? Web agent eval today is broken😦: reCAPTCHA, geo-blocks, content drift, flaky networks, and login-gated deep features (e.g., account and checkout) that benchmarks can't touch. Live sites can't be reset either — making online agent RL impractical. Again, the bottleneck isn't the agent. It's the environment. WebHarbor: dock real websites into stable, reproducible local mirrors with sub-second reset. But here's the key 🌱 — you can't clone the entire web upfront, and you don't need to. WebHarbor evolves with the agent: as harder tasks arrive, environments grow to support them. Coding agents (e.g., Claude Code/CodeX) build mirrors fast; human reviewers catch what coding agent hacks (shortcuts, leaks, fake completions). We need you. 🙌 Help us scale to 100+ and beyond: 🔨 Contribute a new web environment 🐛 Fix or improve existing mirrors 🔍 Audit task fidelity & interaction realism See more details and join the effort: - 🏠 Project Page: https://t.co/TEVpIBDcLO - 💻 GitHub Repo: https://t.co/DLEotmzG7h - 📝 Contribution Form: https://t.co/HpQzDbWh7U Let's build the open-source environment infrastructure for GUI web agents! ⚓ Initiating institutions: UNC-Chapel Hill ✖️Microsoft #AIAgents #WebAgents #LLM #OpenSource #AgenticAI

2

37

15

2

3K

0

12

5

2

716

Qianhui Wu

@5000hui

about 2 months ago

WebXSkill: Learn executable skills for web agents from synthetic tasks and trajectories! 💡

elvis

@omarsar0

about 2 months ago

Cool new paper from Microsoft on Skill learning for autonomous web agents.

1

59

10

55

13K

0

6

0

1

442

Qianhui Wu

@5000hui

3 months ago

We've released the full package for GUI-Libra! 🌟 📂 Data/Model: https://t.co/SNA93dKvjN 📄 Paper: https://t.co/3ptHVsR0Rr 🌐 Project: https://t.co/7bQkdpWWPW Happy to hear feedback from the community!

Rui Yang

@RuiYang70669025

3 months ago

Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (https://t.co/OVFuSSecTX): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (https://t.co/iwYIL95F6h) and code/dataset/model (https://t.co/41T3p8XKnK) for more details. #GUI #agent #VLM

RuiYang70669025's tweet photo. Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔

✨Introducing GUI-Libra (https://t.co/OVFuSSecTX): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!).

Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains.

Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents.

Check out our paper (https://t.co/iwYIL95F6h) and code/dataset/model (https://t.co/41T3p8XKnK) for more details. #GUI #agent #VLM

1

59

12

22

12K

0

21

7

10

4K

Qianhui Wu

@5000hui

3 months ago

Congrats to the LightMem team! 👏Great to see the continued exploration of topic-based segmentation and lightweight compression for building efficient memory systems for LLMs. Glad that our findings in SeCom and LLMLingua-2 have been useful building blocks for the community. 😀

Ningyu Zhang@ZJU

@zxlzr

4 months ago

We’re thrilled to share that our team’s work LightMem has been accepted to ICLR 2026 🎉 Paper: https://t.co/e7Zhk74fzh Code: https://t.co/Mlyaxqf9KY LightMem is a lightweight, modular memory system for LLM agents that enables scalable long-context reasoning and structured memory management across tasks and environments. Recent updates: 1️⃣ Introduced a comprehensive baseline evaluation framework for benchmarking memory layers (Mem0, A-MEM, LangMem) across datasets like LoCoMo and LongMemEval 2️⃣ Released a demo video showcasing long-context handling, along with tutorial notebooks covering multiple usage scenarios 3️⃣ Enabled multi-tool invocation via MCP Server integration 4️⃣ Added full LoCoMo dataset support and integrated GLM-4.6, achieving strong performance and efficiency with reproducible scripts 5️⃣ Supported local deployment through Ollama, vLLM, and Transformers with automatic model loading #ICLR2026 #LLM #Agents #MemorySystems #LightMem

zxlzr's tweet photo. We’re thrilled to share that our team’s work LightMem has been accepted to ICLR 2026 🎉

Paper: https://t.co/e7Zhk74fzh

Code: https://t.co/Mlyaxqf9KY

LightMem is a lightweight, modular memory system for LLM agents that enables scalable long-context reasoning and structured memory management across tasks and environments.

Recent updates:

1️⃣ Introduced a comprehensive baseline evaluation framework for benchmarking memory layers (Mem0, A-MEM, LangMem) across datasets like LoCoMo and LongMemEval

2️⃣ Released a demo video showcasing long-context handling, along with tutorial notebooks covering multiple usage scenarios

3️⃣ Enabled multi-tool invocation via MCP Server integration

4️⃣ Added full LoCoMo dataset support and integrated GLM-4.6, achieving strong performance and efficiency with reproducible scripts

5️⃣ Supported local deployment through Ollama, vLLM, and Transformers with automatic model loading

#ICLR2026 #LLM #Agents #MemorySystems #LightMem

6

177

28

139

16K

0

8

2

1

1K

Qianhui Wu

@5000hui

5 months ago

🔊2026 Summer Internship @MSFTResearch Deep Learning Group🔊 We’re looking for a self-motivated intern with strong background on ⛑️building GUI agent environments and/or 🏗️reinforcement learning. 📩Interested? Send your CV + a short intro to [email protected]!

6

343

20

299

24K

Qianhui Wu

@5000hui

7 months ago

🧠 Key ideas: @zhaoywang_CS • categorized task synthesis from real websites📷 • online task refinement when the task is against observation 📷 • offline trajectory refinement to remove noisy steps 🪄

Zhaoyang Wang

@zhaoywang_CS

7 months ago

🚀 New work: SynthAgent – a fully synthetic supervision pipeline for web agents 🤖 We generate high quality and environment-specific tasks + trajectories to adapt agents to new websites without human efforts 🧠🧼 arxiv：https://t.co/6cWiMMghW7 code：https://t.co/fYPwViPXHd

zhaoywang_CS's tweet photo. 🚀 New work: SynthAgent – a fully synthetic supervision pipeline for web agents 🤖
We generate high quality and environment-specific tasks + trajectories to adapt agents to new websites without human efforts 🧠🧼

arxiv：https://t.co/6cWiMMghW7
code：https://t.co/fYPwViPXHd https://t.co/iozw2VmgBl

2

12

5

3

2K

0

3

0

1K

5000hui retweeted

Xiao Yu @ ICLR2026 @xy2437

8 months ago

Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔 We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this! 🧵[1/n]

xy2437's tweet photo. Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔

We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this!
🧵[1/n]

1

10

4

3K

5000hui retweeted

Da Yu @DaYu85201802

9 months ago

✨ Internship Opportunity @ Google Research ✨ We are seeking a self-motivated student researcher to join our team at Google Research starting around January 2026. 🚀 In this role, you will contribute to research projects advancing agentic LLMs through tool use and RL, with the goal of enabling breakthrough applications. We are particularly interested in PhD students with a strong background in these areas. If interested, please send a brief self-introduction and your CV to [email protected]. Looking forward to connecting with talented researchers in this exciting space!

16

829

94

609

76K

Qianhui Wu

@5000hui

11 months ago

@LiJunnan0409 Awesome work! 🥂 I feel like the design of our GUI-Actor — which can propose multiple candidate regions in one forward pass— combined with a Grounding Verifier could work really well within the 'test-time scaling' framework of GTA1! 😀

0

5

1

0

220

Qianhui Wu

@5000hui

11 months ago

@i_Am_Snow_Flake Thanks for sharing! Please check out the demo here: https://t.co/vPc07bS80T

0

2

0

51

Qianhui Wu

@5000hui

12 months ago

Huge thanks to the @SimularAI team for hosting, and to my amazing collaborators for making this project possible! 🙏 Excited to see where this direction takes us next! 🔗 https://t.co/h7NRbvHR5D

Simular

@SimularAI

12 months ago

Big thanks to Qianhui Wu @5000hui and the team behind “Act Where You See” for sharing their amazing work this week at @SimularAI Seminar! 🧠⚡️ Coordinate-free visual grounding for GUI agents is a huge leap toward human-like interaction. 📎 https://t.co/oRX7zvYLDY #AI #SimularSeminar #GUIAgents #SimularToHuman

SimularAI's tweet photo. Big thanks to Qianhui Wu @5000hui and the team behind “Act Where You See” for sharing their amazing work this week at @SimularAI Seminar! 🧠⚡️

Coordinate-free visual grounding for GUI agents is a huge leap toward human-like interaction.

📎 https://t.co/oRX7zvYLDY

#AI #SimularSeminar #GUIAgents #SimularToHuman

1

9

3

1

4K

0

21

1

2K

Qianhui Wu

@5000hui

about 1 year ago

@touken_titan The key idea of GUI-Actor should also apply in embodied scenarios. We are also thinking about how to adapt it such scenarios.

1

0

45

Qianhui Wu

@5000hui

about 1 year ago

🚀 Excited to share GUI-Actor—a new approach for GUI grounding! Big thanks to @_akhaliq for featuring our work! 🌐 Project page: https://t.co/nHAq2tWp6q 📜 Paper: https://t.co/LRzQwJkccu 🤔 What's limiting coordinate generation-based GUI grounding? 1️⃣ Weak spatial-semantic alignment 2️⃣ Ambiguous supervision signals 3️⃣ Vision–action granularity mismatch 👀 But think about it: humans don’t calculate precise screen coordinates—we perceive elements and then act directly. 💡 Meet GUI-Actor: a VLM with an attention-based action head that: ✅ Addresses above limitations ✅ Proposes multiple candidate regions in one pass, enabling flexible downstream strategies. ✅ Performs coordinate-free grounding that better mirrors human behavior ➕ We also introduce a grounding verifier to select the most plausible action region — and it can boost other grounding methods too. 🎯 Results? GUI-Actor achieves SOTA on several benchmarks, even GUI-Actor-7B outperforms UI-TARS-72B on ScreenSpot-Pro, all using the same Qwen2-VL backbone.

AK

@_akhaliq

about 1 year ago

Microsoft just dropped GUI-Actor on Hugging Face Coordinate-Free Visual Grounding for GUI Agents

3

323

57

212

64K

4

103

27

42

29K

5000hui retweeted

Jianwei Yang

@jw2yang4ai

about 1 year ago

🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 https://t.co/BolazSxgTb ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof. @YunzhuLiYZ, Prof. @furongh to talk about the exciting researches to bring vision to the wild! 🌎Join top researchers tackling real-world vision challenges — from dynamic environments to embodied agents! See you all at #CVPR2025! #CVPR2025 #ComputerVision #AI

jw2yang4ai's tweet photo. 🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025!
🔗 https://t.co/BolazSxgTb

⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof. @YunzhuLiYZ, Prof. @furongh to talk about the exciting researches to bring vision to the wild!

🌎Join top researchers tackling real-world vision challenges — from dynamic environments to embodied agents! See you all at #CVPR2025!

#CVPR2025 #ComputerVision #AI

1

102

23

9

28K

Qianhui Wu

@5000hui

about 1 year ago

Check out our SeCom and other amazing works!

Microsoft Research

@MSFTResearch

about 1 year ago

In this issue of Research Focus, we examine a new conversation segmentation method that delivers more coherent and personalized agent conversation, and we review efforts to improve MLLMs’ understanding of geologic maps. Check out the latest research: https://t.co/XuYK1ChxBg

MSFTResearch's tweet photo. In this issue of Research Focus, we examine a new conversation segmentation method that delivers more coherent and personalized agent conversation, and we review efforts to improve MLLMs’ understanding of geologic maps. Check out the latest research: https://t.co/XuYK1ChxBg https://t.co/kM3tQxjn56

0

25

8

4

8K

0

11

1

0

1K

Qianhui Wu

@5000hui

over 1 year ago

🚀 Excited to introduce our latest work: Magma - A Foundation Model for Multimodal AI Agents! 🔥 🌐 Project: https://t.co/UgqapTmoOM 📄 Paper: https://t.co/ydIM2wHuGl Check it out and let us know what you think! #AIAgents #Multimodal

Jianwei Yang

@jw2yang4ai

over 1 year ago

Thanks for featuring our work! @arankomatsuzaki. 🔥Today we are thrilled to announce our MSR flagship project Magma! This is a fully open-sourced project. We will roll out all the stuff: code, model and training data through the following days. Check out our full work here: https://t.co/GL22DQYqLA ! To the best of our knowledge, Magma is the first-ever foundation model for multimodal AI agents designed to handle complex interactions for agentic tasks. With a single suite of parameters, Magma achieves state-of-the-art UI navigation and robotics manipulation across both digital and physical environments, as well as excelling on generic image and video understandings!

7

187

38

70

36K

1

14

1

1K

5000hui retweeted

Qian Liu

@sivil_taram

over 1 year ago

Thrilled to share that RegMix has been accepted by #ICLR2025! 🎉 Massive shoutout to the incredible co-authors @xszheng2020 @Muennighoff @GuangtaoZ @LongxuDou @TianyuPang1 Jing Jiang @mavenlin! 🙏 Huge thanks to ICLR reviewers for helping us improve RegMix! 🌟 some key Improvements from v1 : 1️⃣ Expanded experiments to 100 domains, and Regression still works extremely well 📈 2️⃣ Conducted 7B model performance over 100B tokens and RegMix beats Human consistently🚀 3️⃣ More results on the confirmation of the rank invariance hypothesis across model scales 📉 4️⃣ New insights at using 1B proxy model level, and it does not show significant advantage than 1M proxy models actually🧠 Code: https://t.co/xqJ8FUNWjf Paper: https://t.co/KcRhKvQsYa

sivil_taram's tweet photo. Thrilled to share that RegMix has been accepted by #ICLR2025! 🎉 Massive shoutout to the incredible co-authors @xszheng2020 @Muennighoff @GuangtaoZ @LongxuDou @TianyuPang1 Jing Jiang @mavenlin!

🙏 Huge thanks to ICLR reviewers for helping us improve RegMix! 🌟 some key Improvements from v1 :

1️⃣ Expanded experiments to 100 domains, and Regression still works extremely well 📈
2️⃣ Conducted 7B model performance over 100B tokens and RegMix beats Human consistently🚀
3️⃣ More results on the confirmation of the rank invariance hypothesis across model scales 📉
4️⃣ New insights at using 1B proxy model level, and it does not show significant advantage than 1M proxy models actually🧠

Code: https://t.co/xqJ8FUNWjf
Paper: https://t.co/KcRhKvQsYa

6

85

13

12

12K

Qianhui Wu

@5000hui

Last Seen Users on Sotwe

Trends for you

Most Popular Users