🚨 Appearing as a #NeurIPS2025 D&B spotlight(~3%)
Could VLMs guess your next prompt for a wearable AI agent?
We present WAGIBench, the 1st large-scale Goal Inference Benchmark for Wearable Agents w/ audiovisual, digital & longitudinal context!
Paper: https://t.co/zkaAWd4zU5
1/
One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.
We found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.
Hi👋 I am at #neurips2025 to present our position paper on world models.🌍👇
🗓️ Dec 7 📍Upper Level Ballroom 20D/LAW workshop.
I’m excited to chat about VL Reasoning & video generation and am actively looking for Postdoc/ Full-time roles. DM me if you would like to chat.
We just released results for our newest VLA from Physical Intelligence: π*0.6. This one is trained with RL, and it makes it quite a bit better: often doubles throughput, enables real-world tasks like folding real laundry and making espresso drinks at the office.
New pre-print from our lab, by Lakshmi Govindarajan @lakshming92 with help from Sagarika Alavilli, introducing a new type of model for studying sensory uncertainty. https://t.co/TMKEDbmbCm
Here is a summary. (1/n)
🚨 Appearing as a #NeurIPS2025 D&B spotlight(~3%)
Could VLMs guess your next prompt for a wearable AI agent?
We present WAGIBench, the 1st large-scale Goal Inference Benchmark for Wearable Agents w/ audiovisual, digital & longitudinal context!
Paper: https://t.co/zkaAWd4zU5
1/
🧠 vs 🤖We benchmarked various evaluation functions by comparing them to human raters. We find that the LLM Judge parameterized with scenario script best aligns with human raters (76.8% agreement), performing as well as any individual from a separate group.
8/
In 1992 Peter Ratcliffe received this rejection letter from Nature.
His findings were not "a sufficient advance in our understanding".
27 years later he won the Nobel Prize for the same discovery.
Don't lose faith in the things you believe in.
🌟To appear in the MechInterp Workshop @ #NeurIPS2025 🌟
Paper: https://t.co/fJS0eripxX
How do language models (LMs) form representation of new tasks, during in-context learning? We study different types of task representations, and find that they evolve in distinct ways.
🧵1/7
Unexpected events grab human attention & push AI models beyond their training data. 🤔
How well do VLMs reason about these critical, novel scenarios?
🦢 Introducing Black Swan #CVPR2025!
Co-lead: @adityachinchure
w/ @AlbertBoyangLi@VeredShwartz
🔗 https://t.co/WWIKkHsprL
This is a first... Reviewer #2 asking us to compare against the very paper we are submitting and that he is (supposed to be) reviewing!? This is all just a farce... #ICLR2025 🙃