๐ Accepted to #CVPR2026
๐ VLMs fall short on complex spatial reasoning.
They struggle with:
โข Precise geometric perception
โข Multi-step reasoning grounded in 3D
โข Adapting perception dynamically to task and context
๐ We propose a solution: visual tool-augmented spatial reasoning โ
bridging perception and multi-step reasoning through diverse, error-aware, adaptive vision tool use.
And we go one step further:
๐ค enabling robot control by treating robots themselves as tools.
Our framework is powered by:
โก Double Interactive RL (DIRL), a new training framework combining demonstrations + real exploration
๐ Real interaction with specialized computer vision models during RL
๐ค Toolshed, a scalable, asynchronous system for multimodal execution of vision tools and robots-as-tools
๐ Project: https://t.co/xNLUjiNG4j
</> Code: https://t.co/MT3wd3MeF6
Toolshed is released with frontier-model demos.
Full training & evaluation release coming soon.
Done during my internship @NVIDIA โ big thanks to the amazing collaborators! ๐
#CVPR2026 #EmbodiedAI #ComputerVision #Robotics #ReinforcementLearning #MultimodalAI
@AndrewLampinen Wow! Understanding mem/gen through the implicit bias or 'abilities' of networks is truly exciting!
In our ICLR2026 paper (https://t.co/ExUmEa9KPa), we prove diffusion models also generalize when learning structures from data and memorize when they store&match training samples.
Weโre excited to share our paper, โOut-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective,โ has been accepted to AISTATS 2026! Congratulations, Soo Min and Alec!
Experiments indicate our results hold in nonlinear Transformers (GPT-2) and nonlinear function classes, which is beyond our simplified theoretical setting.
Link to arXiv version: https://t.co/57L8TFzf57
Weโre excited to share our paper, โLinearly Separable Features in Shallow Nonlinear Networks: Width Scales Polynomially with Intrinsic Data Dimension,โ has been accepted to AISTATS 2026! Congratulations, Alec!
Notably, our result states a network width that is polynomial with the intrinsic dimension suffices, bridging the gap between previous theoretical guarantees and empirical observations.
Letโs congratulate Dr. Li on this tremendous milestone! We are incredibly proud of his achievements and wish him all the very best as he begins this new chapter at The University of Hong Kong.
Dr. Li is the first PhD graduate of our lab and one of the most dedicated, charismatic, and resilient students we have had. Over the past five years, Xiao has been an integral part of our DeepThink Lab, contributing significantly to our work on representation learning.
'Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination', by Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu.
https://t.co/hEWqZG8h3e
#classification#deep#features
๐ฅ Weโre excited to share our DeepThink Lab's "year-end" work on the generalization of diffusion models by opening the black box of their neural network backbones.
๐ท Learned representations can indicate whether the model is learning underlying data structures (with balanced, informative representations) or memorizing training data (with spiky representations). We elaborate below.
Read the full paper on arXiv: https://t.co/V0b62aVVqC