Working on 3D for Amazon Just Walk Out (JWO). Applied Scientist @Amazon, Ph.D.@UBC, intern @Amazon, Intern @NvidiaAI, Intern @Google. Opinions are my own.
1/5 ๐ Thrilled to open-source OSCAR ๐ค โ an action-conditioned world model for robotics, led by the visiting student in my group @wuzy2115! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute.
Everything is public, including training data.
๐ Paper: https://t.co/0KokZcPhP5
๐ Project: https://t.co/g3SGfAHI76
๐ป Code: https://t.co/Hv81Yjy5kV
๐ค Robot data: https://t.co/p6g967kbXg
๐ค Human data: https://t.co/5f5pMnZGrc
๐ค Weights: https://t.co/HRAloQeh8t
#Robotics #WorldModels #AI #OpenSource
Hey, folks who just finished CVPR, 3D is not dead yet at least in Amazon JWO (Autonomous store) where we still care 3D reconstruction and camera stuff. Please drop me an Email if you are looking for a FTE since we are looking for a scientist for 3D (SfM, GFM) now!
Time of getting a work permit in Canada is ~259days (>8mons). I feel sorry for PhD students who need a work visa for summer internship. I donโt think tech companies can wait for 8mons.
BTW, JWO is recruiting interns (US or Canada?) in 3D/VLM/VDM. Shot me your CV.
Spatial reconstruction is a long-context problem: real scenes come with hundreds of images. But O(Nยฒ) transformer-based models donโt scale efficiently.
Introducing: ๐คZipMap (CVPR โ26): Linear-Time, Stateful 3D Reconstruction via Test-Time Training (TTT).
ZipMap โzipsโ a large image collection into an implicit TTT scene state in a single linear-time operation. The state will then be decoded into spatial outputs, and can be queried efficiently for novel-view geometry and appearance (~100 FPS)
ZipMap is not only much faster (>20ร faster than VGGT), but also matches or surpasses the accuracy of all SOTA models.
Hi, I will be at @NeurIPS25_San_Diego Please reach out and always happy to chat about 3D vision, simulation, generative modelling etc and their challenges in real-world applications!
@ducha_aiki Interestingโsimilar to what I found with more recent Pi3 and mapanything. Iโm now curious if we able or not to get rid of RANSAC with data and model scaled up infinitely.
@tedlasai Cool work! While it has the better capability than RGB sensors, do you know if these multi-spectral sensors are scalable? Price, lifespan, energy consumption etc.
@yongyuanxi Agree! Geomatics is super interesting and practical. Though, a lot of my peers (yes, I also hold BSC/MSC degrees in this area) changed their focus to science problems such as climate change and biodiversity. Geomatics turns out to be the tech/tools behind their research.
Thanks for checking out our live demos at #CVPR2025 today! Here are the others weโre hosting this weekend:โโโโโโโโโโโโโโโโ
Saturday, June 14:
โฝ๏ธ10:30-11am: Multi-Modal AI - Just Walk Out & Visual Reasoning
โฝ๏ธ1:30-2pm: Audio/Video Generative Intelligence at Prime Video
โฝ๏ธ2-2:30pm: Vulcan Stow/Pick
Sunday, June 15:
โฝ๏ธ10:30-11am: Multi-Modal AI - Just Walk Out & Visual Reasoning
โฝ๏ธ11-11:30am: Audio/Video Generative Intelligence at Prime Video
๐ข๐ข๐ข NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization
https://t.co/1rxX7UDc0q
https://t.co/iuA7HJU49E
#3DV2025
TL;DR: neural reconstruction with a simpler architecture (no linear systems to solve), and up to 3x speedup vs. voxel-based methods!