How can we run reconstruction models like π³ and Depth Anything 3 in real-time?
We present KV-Tracker, a training-free approach, for real-time tracking of scenes and objects. Achieving up to 30 FPS!
With @alzugarayign, @makezur, @XinKong_IC and @AjdDavison
How can we run reconstruction models like π³ and Depth Anything 3 in real-time?
We present KV-Tracker, a training-free approach, for real-time tracking of scenes and objects. Achieving up to 30 FPS!
With @alzugarayign, @makezur, @XinKong_IC and @AjdDavison
How can we run reconstruction models like π³ and Depth Anything 3 in real-time?
We present KV-Tracker, a training-free approach, for real-time tracking of scenes and objects. Achieving up to 30 FPS!
With @alzugarayign, @makezur, @XinKong_IC and @AjdDavison
KV-Tracker enables object-level reconstruction and tracking when provided with an object mask.
The KV-cache can be saved and used later without any special initialisation procedure.
Per-frame geometry from π³ is split into primitives via segmentation and tracked over time using dense 2D point tracks. With a compact per primitive pose, geometry is densely aligned, stitching primitives to create a complete reconstruction of the observed scene components.
Introducing 4D Primitive-Mâché (4DPM), a new method for replayable 4D reconstruction from monocular videos.
We split dynamic scenes into 3D primitives and recover their motion. 4DPM can infer object positions even after they leave view.
Joint work with @marwan_ptr@AjdDavison
ACE-SLAM naturally handles loop closure without special treatment, robustly deals with dynamic objects, while remaining lightweight (small MLP) and computationally efficient—making this representation compelling for SLAM.
Excited to present ACE-SLAM, the first neural SLAM to use Scene Coordinate Regression as an implicit map representation
Efficient (real-time from live stream), compressive (neural maps <1MB) and robust to dynamic scenes
With @marwan_ptr and @AjdDavison
https://t.co/tMsD5hTkB3
🚀 Excited to share CausNVS: Autoregressive Multi-view Diffusion for Flexible 3D Novel View Synthesis!
Let’s recon 3D world generatively. CausNVS handles any number of input views, synthesizes novel views autoregressively, enables interactive streaming and flexible N-to-M NVS.
Introducing MASt3R-SLAM, the first real-time monocular dense SLAM with MASt3R as a foundation.
Easy to use like DUSt3R/MASt3R, from an uncalibrated RGB video it recovers accurate, globally consistent poses & a dense map.
With @eric_dexheimer*, @AjdDavison (*Equal Contribution)
EscherNet will be presented tomorrow at #CVPR. But *now* you can drop a couple of images into our Hugging Face demo to try it out!
https://t.co/PaiXGJlgoF
Tired of single image to 3D? Check out EscherNet tomorrow @CVPR that can take flexible number of views for 3D generation!
THURSDAY, JUNE 20
ORAL: 9:00-10:30, SUMMIT BALLROOM (TOP FLOOR)
POSTER: 10:30-12:00, ARCH 4A-E, #69
Try our @Gradio online demo
https://t.co/PAaJdgRaZt
SuperPrimitives will be presented at #CVPR next week (Wednesday), along with a 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗱𝗲𝗺𝗼 on Friday!
Our new representation enables dense monocular 3D reconstruction in real-time. No poses required!
Project page: https://t.co/l360soWiAz
From RGB images we can estimate camera rotation, *without* knowledge of camera intrinsics. This also leads to some cool downstream applications - it can complement an IMU .. we call it U-ARE-ME!
@AjdDavison @DoC_Rhodes94 @BaeGwangbin