We all knew LLM agents struggle to explore, but we had to eyeball it ๐. We couldn't measure exploration errors. Until now. ๐บ๏ธ๐ค
We built a policy-agnostic metric to quantify exploration and exploitation errors in LLM agents.
Spoiler: Exploration error is what kills๐ agent performance in our setting ๐๐งต(1/8)
Lots of good news this week! ๐
1. My internship project from @AdobeResearch has been accepted to #SIGGRAPH2026!
("MAOAM: Unified Object & Material Selection with Vision-Language Models")
Special thanks to my wonderful mentor @michi_fischer who has made this project possible!
2. Paper accepted to #ICML2026!
("DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents")
3. Paper accepted (with minor revisions) at #DMLR!
("Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models")
In both papers, we generate carefully designed benchmarks to tackle compositional/multi-hop reasoning in VLMs. Proud to have contributed in these projects.
More detailed posts soon :) Stay tuned!
We all knew LLM agents struggle to explore, but we had to eyeball it ๐. We couldn't measure exploration errors. Until now. ๐บ๏ธ๐ค
We built a policy-agnostic metric to quantify exploration and exploitation errors in LLM agents.
Spoiler: Exploration error is what kills๐ agent performance in our setting ๐๐งต(1/8)
I will be at #ICLR2026 to present my work on data contamination in VLMs! (Fri, Apr 24, 2026 โข 8:30 AM โ 11:00 AM, Pavilion 3 P3-917)
I am currently interested in VLA/physical AI, agents and robustness/generalization.
Would love to chat and connect with anyone with similar interests :)
Me: memorize past exams ๐๐ฏ
Also me: fail on a slight tweak ๐คฆโโ๏ธ๐คฆโโ๏ธ
Turns out, we can use the same method to ๐ฑ๐ฒ๐๐ฒ๐ฐ๐ ๐ฐ๐ผ๐ป๐๐ฎ๐บ๐ถ๐ป๐ฎ๐๐ฒ๐ฑ ๐ฉ๐๐ ๐! ๐งต(1/10)
- Project Page: https://t.co/ue1GybD4fm
@neural_avb Thank you for your interest in our work! Feel free to let me know if you'd like to discuss anything about our work! https://t.co/5uVypzw1Nz :)
This was a joint co-first author work with @jungtaek_kim and @jongwonjeong123, with the guidance from @rdnowak, @Kangwook_Lee and @yong_jae_lee.
If this sounds interesting, please check out our paper ๐ https://t.co/UYor59qioM!
If you have any questions, feedback, or new ideas, I'd be more than happy to discuss!๐งต(8/8)
We all knew LLM agents struggle to explore, but we had to eyeball it ๐. We couldn't measure exploration errors. Until now. ๐บ๏ธ๐ค
We built a policy-agnostic metric to quantify exploration and exploitation errors in LLM agents.
Spoiler: Exploration error is what kills๐ agent performance in our setting ๐๐งต(1/8)
Can we improve exploration failures in LM agents? ๐ ๏ธ
๐บ๏ธ Exploration Prompts: Explicitly injecting exploration strategies increases success rate by 17%.
๐ Explicit Harness: Providing the agent with structured summaries of its past observations; success rate boost by 29.4%! ๐งต(7/8)
Excited to be back at @AdobeResearch this summer where I will be working with @Shramanpramani2 :)
Would love to connect with anyone who will be around!
๐ฅ Upgrade your frozen vision encoders with <10 lines of code!
Single-scale inference throws away vital details. Enter MuRF ๐: a simple, training-free plug-in for instant, massive gains in MLLMs, Seg & Depth. ๐คฏ 1/6
๐จNew work with @Meta@RealityLabs
We introduce EGAgent, an agentic reasoning framework for very long video understanding powered by entity scene graphs
Why? With long multimodal data streams, agents must search and reason across multiple modalities!
๐งต (1/n)
New paper out! ๐จ Introducing STTS: Unified Spatio-Temporal Token Scoring for Efficient Video VLMs. We tackle the massive token bottleneck in video models by jointly identifying the tokens that actually matter. The overall figure below breaks down the core problem! ๐งต๐
Hi ML Twitter!
My Summer 2026 internship unfortunately fell through last minute ๐ตโ๐ซ
If your team is looking for interns, Iโd love to connect - RTs appreciated ๐
My website: https://t.co/rNih6t6Emb