🔎 New Toolkit Released: VLM-Lens 🔎
https://t.co/v9nf1u8YQG
In the past 10 months, our lab, together with collaborators @ziqiao_ma@SLED_AI@jzhou_jz, developed a simple and streamlined interpretability toolkit for VLMs supporting 16 state-of-the-art models across the board!
Excited to attend #COLM2025 in Montréal this week! I’ll be presenting our paper "Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation", in Poster Session 4. Looking forward to meeting many of you there! ☺️
https://t.co/Wjq1oFgibG
Regrettably can’t attend #COLM2025 due to deadlines, but @JaneDing_AI and @SLED_AI will be presenting our work. :)
@JaneDing_AI is an exceptional undergraduate researcher and a great collaborator! Go meet her at COLM if you’re curious about her work on mechanistic interpretability, multimodality, & pragmatics!
+1 on this! Mixed-effects models are such an underrated protocol for behavioral analysis that AI researchers often overlook. Behavioral data are almost never independent: clustering, repeated measures, and hierarchical structures abound. Mixed-effects models account for these while properly partitioning variance, and this applies equally to analyzing AI model behaviors.
Another example is our work on analyzing the behaviors of VLMs in resolving eye gaze reference (https://t.co/zSSzikmbFs), led by @zory_zhang.
Our study on pragmatic generation is accepted to #COLM2025!
Missed the first COLM last year (no suitable ongoing project at the time😅). Heard it’s a great place to connect with LM folks, excited to join for round two finally.
Thrilled to finally share SimWorld — the result of over a year’s work of the team.
Simulators have been foundational for embodied AI research (I’ve worked with AI2Thor, CARLA, Genesis…), and SimWorld pushes this further with photorealistic Unreal-based rendering and scalable procedural generation. Perfect for training and deploying spatial models, world models, and agentic intelligence.
Check out the teaser + docs on our site. If you missed today’s demo, come find us at CVPR #7, ExHall B tomorrow!
Detailed technical report + some exciting papers dropping soon, stay tuned!
P.S., We are building @GrowAiLikeChild, an open-source community uniting researchers from computer science, cognitive science, psychology, linguistics, philosophy, and beyond. Instead of putting growing up and scaling up into opposite camps, let's build and evaluate human-like AI by growing it like a child at scale :-)
More to come, stay tuned!
Vision-Language Models (VLMs) can describe the environment, but can they refer within it? Our findings reveal a critical gap: VLMs fall short of pragmatic optimality.
We identify 3 key failures of pragmatic competence in referring expression generation with VLMs: (1) cannot uniquely refer to the referent, (2) include excessive or irrelevant information, and (3) misalign with human pragmatic preferences.
We introduce RefOI, a new dataset of 1.5k objects, each with 3 written and 2 spoken human-produced referring expressions. We also release RefOI-TLHF, a large dataset of token-level human feedback for 10.6k referring expressions.
👀https://t.co/2XgxeuqtSf
📄https://t.co/98N8V4xhrz
Excited to colead this project with the amazing @JaneDing_AI, and huge thanks to the dream team @eva_xuejunzhang@carrot0817_@jiahed0322 @6SihanXu @ngutinhyc@RoihnPeng@SLED_AI.