📣 I am hiring postdoctoral fellows in agentic AI at the R2L Lab, @UWaterloo.
Lead your own agenda - systems that read, reason & act. Top-venue publishing, substantial compute, weekly PI 1:1s, and a real path to faculty/industry.
Rolling review. Apply 👉 https://t.co/KUYhrKAzsU
Grateful to be among this year's @AmazonScience Research Award recipients! Our project looks at hierarchical memory for scalable, long-horizon AI agents. Thanks to @amazon and to my students and collaborators.
Liner is partnering with @spoticlr at #ICLR2026 — supporting Best Paper and Travel Awards for LLM research.
And to celebrate, we're giving away:
✈️ Round-trip flights + hotel to #ICML2026 in Seoul
🎁 $300 Liner Credits
Follow @search_liner + repost to enter by 4/27.
Liner is built for research workflows. Find papers, verify sources, and write with citations in one place.
See you in 🇧🇷 and 🇰🇷!
@iclr_conf@icmlconf
My student @arthurchen189 is on the job market. His research is on sample efficient test time adaptation and data synthesis. He has two presentations at #iclr2026. Please reach out to him directly! I’m also happy to provide intro!
R2L Lab will be at #ICLR2026! Check out @arthurchen189 's grounded adaptation of agents post deployment, and @TonyCheng990417 's study of compositional generalization through RL! I'll also be there, please email to meet.
https://t.co/hmc2kT2ZQ4
https://t.co/mdYQD9DhPp
Excited to share that "Grounded Test-Time Adaptation for LLM Agents" (GTTA) has been accepted to #ICLR2026! 🎉
LLM agents often fail in novel environments. We enable them to adapt at test time via self-initiated exploration -- no fine-tuning required. 🚀
𝟔/𝟗 [𝐈𝐦𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬] 💡 Why it matters for Applied AI:
• 𝐃𝐞𝐩𝐥𝐨𝐲𝐦��𝐧𝐭-𝐭𝐢𝐦𝐞 𝐚𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧: Works on unseen enterprise systems.
• 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠-𝐅𝐫𝐞𝐞: No heavy fine-tuning or human annotations.
𝟓/𝟗 [𝐌𝐞𝐭𝐡𝐨𝐝 𝟐: 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜] 2️⃣ Parametric Adaptation: To handle unexpected syntax, we use online vector updates. This lightweight method biases the model's output distribution to align with the environment's specific format instantly.
𝟒/𝟗 [𝐌𝐞𝐭𝐡𝐨𝐝 𝟏: 𝐃𝐲𝐧𝐚𝐦𝐢𝐜𝐬] 1️⃣ Dynamics Grounding: The agent explores the environment before it sees any task. It discovers state-transition rules (e.g., "Clicking this button opens a modal") and adds them to its context.
𝟑/𝟗 [𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 𝐎𝐯𝐞𝐫𝐯𝐢𝐞𝐰] ⚙️ We introduce GTTA with two strategies:
1️⃣ 𝐃𝐲𝐧𝐚𝐦𝐢𝐜𝐬 𝐆𝐫𝐨𝐮𝐧𝐝𝐢𝐧𝐠: Building in-context World Models.
2️⃣ 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐫𝐢𝐜 𝐀𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧: Rapid syntax learning.���
𝟐/𝟗 [𝐓𝐡𝐞 𝐏𝐫𝐨𝐛𝐥𝐞𝐦] 🤔 The Challenge: Why do agents break on new environments (e.g., websites)?
- 𝐔𝐧𝐤𝐧𝐨𝐰𝐧 𝐒𝐲𝐧𝐭𝐚𝐱: Unfamiliar UI elements or API formats.
- 𝐔𝐧𝐤𝐧𝐨𝐰𝐧 𝐃𝐲𝐧𝐚𝐦𝐢𝐜𝐬: Not knowing "what happens if I click X?"
🔥The quality of synthetic data matters more than its size.
We introduce SynQuE - a framework that ranks synthetic datasets by their real-world usefulness without any labeled real data (e.g., benchmarks).
Select better training synthetic data, train better models.
(3/3) When to Use Which Proxy:
- Distribution-based proxies excel on single-hop tasks like Text2SQL.
- LENS shines on long-horizon reasoning tasks such as web navigation.
🔍SynQuE proxies (1/3)
To estimate synthetic data quality, SynQuE introduces proxy metrics that adapt distribution- and diversity-based measures.
For complex tasks, we propose LENS - an LLM-based proxy that creates rubrics to capture differences between synthetic and real data.