Super excited to introduce our new work Marigold 🌼 — an universal affine-invariant depth estimator. Try out the demo and you will find how amazing it is! Project page: https://t.co/r5cjzu1J3B
Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out:
🌐 Website: https://t.co/rBXhxQChn4
🤗 Hugging Face Space: https://t.co/CYvZZQtEYr
📄 Paper: https://t.co/eEw11yW6MY
👾 Code: https://t.co/0L49Znp2z1
The team: Bingxin Ke (@KBingxin), yours truly (@AntonObukhov1), Shengyu Huang (@ShengyHuang), Nando Metzger (@NandoMetzger), Rodrigo Caye Daudt (@rcdaudt), and Konrad Schindler.
#ComputerVision #PRS #ETHZurich
Ke et al., "CAPA: Depth Completion as Parameter-Efficient Test-Time Adaptation"
Fine-tune your foundational model at test time with sparse measurements. Makes a lot of sense if you have, e.g. Lidar measurements with you.
🚀 Exciting news! We’re introducing VGG-T³: a scalable model for offline feed-forward 3D reconstruction that finally tackles the "quadratic bottleneck."
Ever wanted to have VGGT reconstruct a 1,000-image scene in seconds instead of 10 minutes and use it for visual localization?
Introducing StereoSpace -- our new end-to-end method for turning photos into stereo images without explicit geometry or depth maps. This makes it especially robust with thin structures and transparencies. Try the demo below
Running out of multi-view data for 3D reconstruction and generation? 🤠
We show how a camera-conditioned video model can be turned into a generative 3D (and dynamic!) Gaussian Splatting model—trained entirely through self-distillation, no real-world data needed.
🚀 Code & models are out for commercial use: https://t.co/Anx3fv1LrP
Kudos to @sherwinbahmani for this amazing work! 🎉
[1/N] 🎥 We've made available a powerful spatial AI tool named ViPE: Video Pose Engine, to recover camera motion, intrinsics, and dense metric depth from casual videos!
Running at 3–5 FPS, ViPE handles cinematic shots, dashcams, and even 360° panoramas.
🔗 https://t.co/1mGDxwgYJt
Team: Bingxin Ke (@KBingxin), Kevin Qu, Tianfu Wang (@TianfuWang2), Nando Metzger (@NandoMetzger), Shengyu Huang (@ShengyHuang), Bo Li, Anton Obukhov (@AntonObukhov1), Konrad Schindler.
We thank @huggingface for their sustained support.
Original announcement of Marigold Depth (CVPR 2024)
https://t.co/271UtkiMlf
Introducing ⇆ Marigold-DC — our training-free zero-shot approach to monocular Depth Completion with guided diffusion! If you have ever wondered how else a long denoising diffusion schedule can be useful, we have an answer for you! Details 🧵
🔥 Rolling-Depth - A new state-of-the-art depth estimator for videos in the wild!
Accurately estimating depth from videos using AI is now possible. No flickering, No Temporal inconsistency 💪
Introducing 🛹 RollingDepth 🛹 — a universal monocular depth estimator for arbitrarily long videos! Our paper, “Video Depth without Video Models,” delivers exactly that, setting new standards in temporal consistency. Check out more details in the thread 🧵
Check out our work on fine-tuning of image-conditional diffusion models for depth and normal estimation.
Widely used diffusion models can be improved with single-step inference and task-specific fine-tuning, allowing us to gain better accuracy while being 200x faster!⚡
🧵(1/6)
@ducha_aiki @Gonzalo_MartinG @kacodes @thecschmidt4@dcdegeus@Pandoro_o This is indeed a very good finding that by changing to another scheduler setting (literally just one setting in the config), the 1-step result gets very good. However, Marigold was using the implementation in Diffusers that the community is using, so it's not "a bug in Marigold".
Unveiling BetterDepth — a plug-and-play diffusion-based refiner for zero-shot monocular depth estimation, compatible with many established depth prediction models.
📕 Paper: https://t.co/Bx4gpOLyyA
🧩 Other: TBA
Fantastic collaboration between ETH Zurich and Disney Research|Studios, by Xiang Zhang (https://t.co/d0p0GvvRZZ), @KBingxin, @chrysmun, @NandoMetzger, @AntonObukhov1, @MarkusGross63, Konrad Schindler, and Christopher Schroers.
Spice up your favorite SOTA monodepth network with a diffusion model! We introduce *BetterDepth*, a plug-and-play refiner for zero-shot monodepth estimation.
Paper: https://t.co/cyTjVZGGkR