Arthur Chen

@arthurchen189

Adapting agents | AI @VectorInst & @UWCheritonCS | Ex-researcher @SFResearch @cerebras

Waterloo, Ontario

Joined September 2022

169 Following

70 Followers

26 Posts

arthurchen189 retweeted

Victor Zhong @hllo_wrld

13 days ago

📣 I am hiring postdoctoral fellows in agentic AI at the R2L Lab, @UWaterloo. Lead your own agenda - systems that read, reason & act. Top-venue publishing, substantial compute, weekly PI 1:1s, and a real path to faculty/industry. Rolling review. Apply 👉 https://t.co/KUYhrKAzsU

1

54

10

28

5K

arthurchen189 retweeted

Victor Zhong @hllo_wrld

20 days ago

Grateful to be among this year's @AmazonScience Research Award recipients! Our project looks at hierarchical memory for scalable, long-horizon AI agents. Thanks to @amazon and to my students and collaborators.

3

42

5

10

9K

arthurchen189 retweeted

2 months ago

Liner is partnering with @spoticlr at #ICLR2026 — supporting Best Paper and Travel Awards for LLM research. And to celebrate, we're giving away: ✈️ Round-trip flights + hotel to #ICML2026 in Seoul 🎁 $300 Liner Credits Follow @search_liner + repost to enter by 4/27. Liner is built for research workflows. Find papers, verify sources, and write with citations in one place. See you in 🇧🇷 and 🇰🇷! @iclr_conf @icmlconf

search_liner's tweet photo. Liner is partnering with @spoticlr at #ICLR2026 — supporting Best Paper and Travel Awards for LLM research.

And to celebrate, we're giving away:
✈️ Round-trip flights + hotel to #ICML2026 in Seoul
🎁 $300 Liner Credits

Follow @search_liner + repost to enter by 4/27.

Liner is built for research workflows. Find papers, verify sources, and write with citations in one place.

See you in 🇧🇷 and 🇰🇷!

@iclr_conf @icmlconf

5

228

238

16

14K

arthurchen189 retweeted

Victor Zhong @hllo_wrld

about 2 months ago

My student @arthurchen189 is on the job market. His research is on sample efficient test time adaptation and data synthesis. He has two presentations at #iclr2026. Please reach out to him directly! I’m also happy to provide intro!

0

26

6

3

3K

arthurchen189 retweeted

Victor Zhong @hllo_wrld

about 2 months ago

R2L Lab will be at #ICLR2026! Check out @arthurchen189 's grounded adaptation of agents post deployment, and @TonyCheng990417 's study of compositional generalization through RL! I'll also be there, please email to meet. https://t.co/hmc2kT2ZQ4 https://t.co/mdYQD9DhPp

1

23

4

11

5K

Arthur Chen @arthurchen189

3 months ago

@ArminPCM @SnorkelAI Interested! I work on data synthesis, evaluation, and agent adaptation. Site: https://t.co/0BessrzRut CV: https://t.co/WtIJKxB9Zu

0

1

0

0

300

Arthur Chen @arthurchen189

5 months ago

𝟗/𝟗 [𝐂𝐫𝐞𝐝𝐢𝐭𝐬] Huge thanks to my mentors @LiuZuxin, @JianguoZhang3, and collaborators: @aksh_555, @JYJimLiu, @shelbyh_ai, @silviocinguetta. Special thanks to my advisors @hllo_wrld and @CaimingXiong for their guidance! @SFResearch #Agents #LLM

0

3

0

0

213

Arthur Chen @arthurchen189

5 months ago

Excited to share that "Grounded Test-Time Adaptation for LLM Agents" (GTTA) has been accepted to #ICLR2026! 🎉 LLM agents often fail in novel environments. We enable them to adapt at test time via self-initiated exploration -- no fine-tuning required. 🚀

arthurchen189's tweet photo. Excited to share that "Grounded Test-Time Adaptation for LLM Agents" (GTTA) has been accepted to #ICLR2026! 🎉

LLM agents often fail in novel environments. We enable them to adapt at test time via self-initiated exploration -- no fine-tuning required. 🚀 https://t.co/Zv4XT66MNX

9

41

7

12

5K

Arthur Chen @arthurchen189

5 months ago

𝟖/𝟗 [𝐋𝐢𝐧𝐤𝐬] Dive into the details: ⁣ 📝 𝐁𝐥𝐨𝐠: https://t.co/7zjluUvQJj 📄 𝐏𝐚𝐩𝐞𝐫: https://t.co/445MnTND2W ⁣ 💻 𝐂𝐨𝐝𝐞: https://t.co/kP6vtJrOns

0

0

0

1

119

Arthur Chen @arthurchen189

5 months ago

𝟕/𝟗 [𝐓𝐡𝐞 𝐑𝐞𝐬𝐮𝐥𝐭𝐬] 🚀 Results from our evaluation: ⁣ • 𝐖𝐞𝐛𝐀��𝐞𝐧𝐚: GPT-4.1 (+NPA) boosts Multi-site success from 2.0% → 23.0%. ⁣ • 𝐁𝐅𝐂𝐋𝐯𝟑: GPT-4.1 (+NP) improves from 55.5% → 64.0%.⁣ Consistent gains across benchmarks without training!

arthurchen189's tweet photo. 𝟕/𝟗 [𝐓𝐡𝐞 𝐑𝐞𝐬𝐮𝐥𝐭𝐬] 🚀 Results from our evaluation: ⁣
• 𝐖𝐞𝐛𝐀��𝐞𝐧𝐚: GPT-4.1 (+NPA) boosts Multi-site success from 2.0% → 23.0%. ⁣
• 𝐁𝐅𝐂𝐋𝐯𝟑: GPT-4.1 (+NP) improves from 55.5% → 64.0%.⁣
Consistent gains across benchmarks without training! https://t.co/iYxhIud919

0

0

0

0

95

Arthur Chen @arthurchen189

5 months ago

𝟔/𝟗 [𝐈𝐦𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬] 💡 Why it matters for Applied AI: ⁣⁣ • 𝐃𝐞𝐩𝐥𝐨𝐲𝐦��𝐧𝐭-𝐭𝐢𝐦𝐞 𝐚𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧: Works on unseen enterprise systems. ⁣⁣ • 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠-𝐅𝐫𝐞𝐞: No heavy fine-tuning or human annotations.

0

0

0

0

71

Arthur Chen @arthurchen189

5 months ago

𝟓/𝟗 [𝐌𝐞𝐭𝐡𝐨𝐝 𝟐: 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜] 2️⃣ Parametric Adaptation: To handle unexpected syntax, we use online vector updates. This lightweight method biases the model's output distribution to align with the environment's specific format instantly.

arthurchen189's tweet photo. 𝟓/𝟗 [𝐌𝐞𝐭𝐡𝐨𝐝 𝟐: 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜] 2️⃣ Parametric Adaptation: To handle unexpected syntax, we use online vector updates. This lightweight method biases the model's output distribution to align with the environment's specific format instantly. https://t.co/8s6zltBTDK

0

0

0

0

73

Arthur Chen @arthurchen189

5 months ago

𝟒/𝟗 [𝐌𝐞𝐭𝐡𝐨𝐝 𝟏: 𝐃𝐲𝐧𝐚𝐦𝐢𝐜𝐬] 1️⃣ Dynamics Grounding: The agent explores the environment before it sees any task. It discovers state-transition rules (e.g., "Clicking this button opens a modal") and adds them to its context.

arthurchen189's tweet photo. 𝟒/𝟗 [𝐌𝐞𝐭𝐡𝐨𝐝 𝟏: 𝐃𝐲𝐧𝐚𝐦𝐢𝐜𝐬] 1️⃣ Dynamics Grounding: The agent explores the environment before it sees any task. It discovers state-transition rules (e.g., "Clicking this button opens a modal") and adds them to its context. https://t.co/wFNeEpjyYf

0

0

0

0

81

Arthur Chen @arthurchen189

5 months ago

𝟑/𝟗 [𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 𝐎𝐯𝐞𝐫𝐯𝐢𝐞𝐰] ⚙️ We introduce GTTA with two strategies:⁣⁣ 1️⃣ 𝐃𝐲𝐧𝐚𝐦𝐢𝐜𝐬 𝐆𝐫𝐨𝐮𝐧𝐝𝐢𝐧𝐠: Building in-context World Models. ⁣⁣ 2️⃣ 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜 𝐀𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧: Rapid syntax learning.��⁣

0

0

0

0

87

Arthur Chen @arthurchen189

5 months ago

𝟐/𝟗 [𝐓𝐡𝐞 𝐏𝐫𝐨𝐛𝐥𝐞𝐦] 🤔 The Challenge: Why do agents break on new environments (e.g., websites)?⁣⁣ - 𝐔𝐧𝐤𝐧𝐨𝐰𝐧 𝐒𝐲𝐧𝐭𝐚𝐱: Unfamiliar UI elements or API formats.⁣⁣ - 𝐔𝐧𝐤𝐧𝐨𝐰𝐧 𝐃𝐲𝐧𝐚𝐦𝐢𝐜𝐬: Not knowing "what happens if I click X?"⁣⁣

0

1

0

0

119

Arthur Chen @arthurchen189

7 months ago

Paper: https://t.co/Z90rd4Q3CQ How SynQuE proxies work:

arthurchen189's tweet photo. Paper: https://t.co/Z90rd4Q3CQ
How SynQuE proxies work: https://t.co/0tEvqBQYAP

0

1

0

0

76

Arthur Chen @arthurchen189

7 months ago

🔥The quality of synthetic data matters more than its size. We introduce SynQuE - a framework that ranks synthetic datasets by their real-world usefulness without any labeled real data (e.g., benchmarks). Select better training synthetic data, train better models.

4

4

1

0

436

Arthur Chen @arthurchen189

7 months ago

(3/3) When to Use Which Proxy: - Distribution-based proxies excel on single-hop tasks like Text2SQL. - LENS shines on long-horizon reasoning tasks such as web navigation.

0

0

0

0

61

Arthur Chen @arthurchen189

7 months ago

(2/3) SynQuE proxy results We benchmark SynQuE across text, image, and web-agent tasks. Using SYNQUE proxies consistently boosts downstream performance �� e.g., Text2SQL accuracy rises from 30.4 → 38.4 (+8.1%) simply by selecting better synthetic datasets.

0

0

0

0

62

Arthur Chen @arthurchen189

7 months ago

🔍SynQuE proxies (1/3) To estimate synthetic data quality, SynQuE introduces proxy metrics that adapt distribution- and diversity-based measures. For complex tasks, we propose LENS - an LLM-based proxy that creates rubrics to capture differences between synthetic and real data.

0

1

0

0

64

Last Seen Users on Sotwe

Trends for you

Most Popular Users