To be honest, training on handmade 4D asset datasets is a dead-end. Almost all 4D asset data is synthetic and diverse real data barely exists, so models trained on it struggle to reconstruct objects that deform, get occluded, and move freely about the scene.
Our new work, Lift4D, instead lifts 2D & 3D priors into 4D, reconstructing complete dynamic objects from a single in-the-wild video 🧵 (1/n)
🔗Webpage + Demos: https://t.co/XI5jUViTpC
"Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild"
TL;DR: combines temporally consistent single-view 3D priors, deformable Gaussian Splatting, and diffusion-guided optimization to reconstruct challenging dynamic scenes from monocular videos.
@REVOLVO_OCELOTS I'm aiming to release the code next month and you can try it yourself, but based on the example the object is not very visible in most frames and the cuts are very harsh (180 degrees), this is very complex so our method would probably not work😢
To be honest, training on handmade 4D asset datasets is a dead-end. Almost all 4D asset data is synthetic and diverse real data barely exists, so models trained on it struggle to reconstruct objects that deform, get occluded, and move freely about the scene.
Our new work, Lift4D, instead lifts 2D & 3D priors into 4D, reconstructing complete dynamic objects from a single in-the-wild video 🧵 (1/n)
🔗Webpage + Demos: https://t.co/XI5jUViTpC
fascinating to see another exemplar of monocular video 2 4D multiview reconstruction - World Labs featured a similar paper a few weeks ago.
Early results certainly seem viable enough to be useful in many contexts, and the barrier to entry is unthinkably lower than full multiview video
To be honest, training on handmade 4D asset datasets is a dead-end. Almost all 4D asset data is synthetic and diverse real data barely exists, so models trained on it struggle to reconstruct objects that deform, get occluded, and move freely about the scene.
Our new work, Lift4D, instead lifts 2D & 3D priors into 4D, reconstructing complete dynamic objects from a single in-the-wild video 🧵 (1/n)
🔗Webpage + Demos: https://t.co/XI5jUViTpC
@nickkarpov Right now the 3D prior is not able to use multi view data, it only takes single view. That said there are methods for using multi view data with image-to-3D models like MV-SAM3D.
@REVOLVO_OCELOTS It depends on how much the object changes as the deformation model has limited capacity. If the perspective is relatively similar across cuts then it should be fine but if the object completely changes the deformation could break.
@JieWang_ZJUI Exactly. If you have infinite budget, e.g. an army of digital artists, you would get much further in designing and modeling high quality static 3D than diverse 4D assets from internet footage. This was the data curation process for SAM3D and it proved successful.
@JieWang_ZJUI Encoding diverse motion for 3D assets makes 4D data curation nearly impossible, for example a folding shirt (which we have a demo of in our website!). In many cases ITW lifting priors trained on clean+dirty data is preferable because clean 4D data is too hard to get.
@JieWang_ZJUI Just saw your comment! 3D in the wild is not great, but compared to 4D it’s still way better in diversity and quantity. Synthetic 3D data is also much easier to create than 4D, because encoding diverse motion is incredibly difficult even for just one configuration.
Big fan of these results from @yehonation on modeling appearance, geometry, and deformation of objects from in-the-wild videos
Excited about potential applications in human-object manipulation!
Worried about the lack of real-world 4D training data?
No, no... Lift4D takes a different route — diving deeper into 2D & 3D priors for truly in-the-wild 4D reconstruction.
🤟 Complete 360° geometry, appearance & deformation from a single casual video.
🔗https://t.co/Getd1Awe6L
We present Lift4D -- allowing dynamic 3D reconstruction of objects from monocular videos with large motions, occlusions, and deformations. See @yehonation's thread for details, and be sure to check out the gallery of interactive in-the-wild results on the project webpage :)