Last week I presented our work (done by me, @rishabkhincha, and others) at the @MICCAI2020 conference, MIL3ID workshop!
In our work, we examined the efficacy of domain-specific transfer learning for lesion classification, especially in the regime of less training data.
Humans don't just see in space—we build cognitive maps of what everything is FOR.
Can MLLMs do the same? We built SFI-Bench to find out. (CVPR 2026)
Most spatial benchmarks stop at perception: "How many chairs?" or "Is the table left of the couch?"
SFI-Bench asks harder questions:
→ "Find the maximum number of same-brand bottles on the cabinet" (conditional counting)
→ "Which objects share a functional relationship across these views?" (affordance reasoning)
→ "Diagnose this device failure using the scene + a web search" (causal troubleshooting)
We tested Gemini-3, GPT-5, Qwen3-VL-235B, LLaVA-Video-72B...
Results reveal a critical gap:
📷 Models are decent at local perception
📷 They fail at maintaining global spatial memory
📷 They struggle to link objects to their real-world function
📷 Web search helps — but only if the model can reason strongly enough to use it
SFI-Bench: 1,555 questions · 134 real-world scans · 6 cognitive tasks
project: https://t.co/dkss3hqn9N
Poster at Fri, Jun 5, 2026 • PM – PM PDT
ExHall A & F 453
#CVPR2026 #Agent #MultimodalLLM #SpatialIntelligence
Humans don't just see in space—we build cognitive maps of what everything is FOR.
Can MLLMs do the same? We built SFI-Bench to find out. (CVPR 2026)
Most spatial benchmarks stop at perception: "How many chairs?" or "Is the table left of the couch?"
SFI-Bench asks harder questions:
→ "Find the maximum number of same-brand bottles on the cabinet" (conditional counting)
→ "Which objects share a functional relationship across these views?" (affordance reasoning)
→ "Diagnose this device failure using the scene + a web search" (causal troubleshooting)
We tested Gemini-3, GPT-5, Qwen3-VL-235B, LLaVA-Video-72B...
Results reveal a critical gap:
📷 Models are decent at local perception
📷 They fail at maintaining global spatial memory
📷 They struggle to link objects to their real-world function
📷 Web search helps — but only if the model can reason strongly enough to use it
SFI-Bench: 1,555 questions · 134 real-world scans · 6 cognitive tasks
project: https://t.co/dkss3hqn9N
Poster at Fri, Jun 5, 2026 • PM – PM PDT
ExHall A & F 453
#CVPR2026 #Agent #MultimodalLLM #SpatialIntelligence
Excited to unveil the LLM - Giraffe! 🦒
Worked with @siddartha_naidu & Arka Pal on this.
Extended context to 4K and 16K, tackling SOTA open-source LLM limitations.
Llama-1 based, Llama-2 soon. Eager for its impact on real-world AI! #AI#LLM 🚀
Repo: https://t.co/bImOOZJ0Wi
Dear Apple I am not able to keep track of and get back to conversations across 10 apps. Needs some OS-level help to sort notifications into fyis and todos that you can sort through, mark as “unread” and deal with when you’re able. Sad as the concept is.
We are excited to open applications for female and genders underrepresented in CS undergrads in India and Southeast Asia to spend a summer with a faculty member to work on exciting problems with comp stipends, supported by Google Research.
Apply now!
https://t.co/wdnNwmY62T
currently at THAT point in the semester where even being incredibly organized about deadlines + trying to have a daily to-do list which tracks unfinished work STILL culminates in UTTER CHAOS
New dates are out! We will be having the AI Symposium on the weekend of 2nd and 3rd October. Registrations are free! For more details:
Website: https://t.co/GYpguaJpFy
Registration: https://t.co/0YgfHd2dcK
Mailing List: https://t.co/xpLyTzrDDv
See you there!
"Does Pretraining for Summarization Require Knowledge Transfer?"
Short answer: most of the gains of T5 over a random init baseline can be realized absent any knowledge (by pretraining on procedurally generated babble)!
Long answer: see thread...
https://t.co/TMBtXIgkLr
If you have less than 3 hours to spare & want to learn (almost) everything about state-of-the-art explainable ML, this thread is for you! Below, I am sharing info about 4 of our recent tutorials on explainability presented at NeurIPS, AAAI, FAccT, and CHIL conferences. [1/n]