๐ขPix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image๐ข
We directly regress neural parametric head models (NPHMs) from a single image โ fast, stable, and significantly more expressive than classical 3DMMs such as FLAME.
Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control.
Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity.
Key to successful and generalized training of our ViT-based network are:
(1) large-scale registration of existing 3D head datasets, and
(2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals.
Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization.
๐https://t.co/89IXGnDl4O
๐ฅhttps://t.co/7AZIcnD3Mq
Great work by @SGiebenhain, @TobiasKirschst1, @liamschoneveld, Davide Davoli, Zhe Chen
๐๐ SHeaP inference code is out ! ๐๐For all your real-time head pose and expression tracking desires! Check it out at: ๐ค HuggingFace spaces: https://t.co/PhrPUc64WA ๐ Github: https://t.co/qYLHXaYsPV
๐ข Our new paper - SHeaP - is out! ๐ข
TLDR: self-supervised head tracking and geometry (FLAME) prediction, learned via photometric loss with a 2D gaussian splatting renderer.
See more:
๐ https://t.co/UZFzynT7sG
๐ฅ https://t.co/uvJ8KgqMcX
๐ข๐ข๐๐๐ : ๐๐๐ฎ๐ฌ๐ฌ๐ข๐๐ง ๐๐ฏ๐๐ญ๐๐ซ ๐๐๐๐จ๐ง๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ข๐จ๐ง ๐๐ซ๐จ๐ฆ ๐๐จ๐ง๐จ๐๐ฎ๐ฅ๐๐ซ ๐๐ข๐๐๐จ๐ฌ ๐ฏ๐ข๐ ๐๐ฎ๐ฅ๐ญ๐ข-๐ฏ๐ข๐๐ฐ ๐๐ข๐๐๐ฎ๐ฌ๐ข๐จ๐ง๐ข๐ข
We reconstruct animatable Gaussian head avatars from monocular videos captured by commodity devices such as smartphones.
Key idea: distill reconstruction constraints from a multi-view head diffusion model to complete unobserved regions.
https://t.co/prz5HnGoWq
https://t.co/XkWBKScwb2
Great work by @jiapeng_tang@davidedavoli@TobiasKirschst1@liamschoneveld
@dome_271 Classifier-free guidance always seemed like some weird hack to me. There must be a more mathematically elegant solution out there, waiting to be found.
@camo2572@LabAgainstWar@AlboMP@RichardMarlesMP@SenatorWong I donโt think it would be that hard to come up with a better plan than spending $360b for offensive nuclear subs we are not even contractually guaranteed to receive?
I feel like literally any plan is better than that one.
@MartinGTobias Most of these measures make sense to me? As someone working in tech, I think itโs well worth spending a little money to encourage more women into the field.
@finbarrtimbers Perhaps limiting the representation space via the limited size of the codebook forces the network to better compress what's really important in the images.
@techchildrights@laion_ai Coming from a human rights organization, I am sure you appreciate the importance of transparency. Without @laion_ai's ongoing AI transparency efforts, we would know very little about the data going into these models.
@Saboo_Shubham_@laion_ai This definitely wouldnโt work as well on papers that havenโt had 1000s of blog and Reddit etc posts written about it though
@dome_271 I actually had this problem a long time ago when trying to use ConvNets to generate audio. Perhaps looking at audio generative model literature may help as high frequency details are perhaps even more important in that domain.