We present Intrinsic Image Fusion in Poster Session 4 (106) on Saturday at #CVPR ๐ฅณ
Come by to learn more about room-scale material estimation!
PS: Code is released ๐
๐ข Intrinsic Image Fusion for Multi-View 3D Material Reconstruction ๐ข
We combine generative material priors with inverse path tracing: 1) define a parametric texture space 2) fuse monocular predictions across views into consistent textures 3) optimize low-dimensional parameters for physically-grounded reconstructions.
The results are relightable PBR textures for 3D scenes: check out the result on a real-world 3D scan from the ScanNet++ dataset!
๐https://t.co/NmvDPNAZ6M
๐ฅhttps://t.co/sExkAH9BkT
Great work by @Peter4AI@LukasHollein!
๐ข 3D world models from video diffusion suffer from inconsistent frames -> blurry output.
Our fix: instead of naรฏve 3D reconstruction, we non-rigidly align each frame into a globally-consistent 3DGS representation.
->sharp visuals on top of any VDM!
https://t.co/laBngKn1wl
๐ข๐ข๐ขData release: high-res, multi-view, OLAT face recordings ๐ข๐ข๐ข
We captured individuals in our custom light stage with 16 high-end, global shutter cameras (72 fps) and 40 LED modules, totaling 2.8M precisely calibrated frames.
We us the data for BecomingLit (#NeurIPS2025): intrinsically decomposed Gaussian avatars, enabling photorealistic and real-time relighting via hybrid neural shading.
Code & Data: https://t.co/kvu20C4vfR
Great work by @jnthnschmdt, @SGiebenhain
๐ขPix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image๐ข
We directly regress neural parametric head models (NPHMs) from a single image โ fast, stable, and significantly more expressive than classical 3DMMs such as FLAME.
Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control.
Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity.
Key to successful and generalized training of our ViT-based network are:
(1) large-scale registration of existing 3D head datasets, and
(2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals.
Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization.
๐https://t.co/89IXGnDl4O
๐ฅhttps://t.co/7AZIcnD3Mq
Great work by @SGiebenhain, @TobiasKirschst1, @liamschoneveld, Davide Davoli, Zhe Chen
๐ข๐ข๐ข๐๐๐ฌ๐ก๐๐ข๐ฉ๐ฉ๐ฅ๐: Structured Autoregressive Generation of Artist-Meshes
High-fidelity, topologically complete 3D assets that expand naturally like a ripple on a surface! ๐
Existing AR models often rely on sliding-window inference over truncated segments. However, this limitation breaks long-range geometric dependencies, causing holes and fragmentation.
Instead, MeshRipple uses frontier-aware BFS and sparse-attention global memory to ensure coherent growth with an unbounded receptive field.
-> Highly detailed-mesh generations
-> Artist-like meshing quality
-> Works on room-scale environments
๐https://t.co/FPmo9QBTac
๐ฅhttps://t.co/oV1zBua5iC
Great work by Junkai Lin, Hang Long, Huipeng Guo, Jielei Zhang, Jiayi Yang, Tianle Guo, Yang Yang, Jianwen Li, Wenxiao Zhang, Wei Yang
@karanjagtiani04@MattNiessner For the occluded regions we don't have any predictions, so now they don't receive gradients, but one option would be to backprop through multiple bounces.
๐ข Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
We reconstruct clean and sharp relightable textures using inverse path tracing and monocular priors.
Check out our project page for more results:
https://t.co/Wmje9lkr9P
๐ข Intrinsic Image Fusion for Multi-View 3D Material Reconstruction ๐ข
We combine generative material priors with inverse path tracing: 1) define a parametric texture space 2) fuse monocular predictions across views into consistent textures 3) optimize low-dimensional parameters for physically-grounded reconstructions.
The results are relightable PBR textures for 3D scenes: check out the result on a real-world 3D scan from the ScanNet++ dataset!
๐https://t.co/NmvDPNAZ6M
๐ฅhttps://t.co/sExkAH9BkT
Great work by @Peter4AI@LukasHollein!
Releasing Echo today is incredibly exciting for me โ because it is a critical step for generative AI, enabling the creation of virtual worlds.
Echo is our first world model at SpAItial AI. It turns text or images into explorable 3D environments โ spaces you can move through, inspect, and build on. Seeing this work in real time still feels a bit surreal.
My fascination with this goes back a long way: video games, virtual environments, and the idea of capturing the real world in 3D. As a researcher, I spent years working on 3D reconstruction, neural rendering, and scene understanding โ all driven by the same question: how do we teach machines to understand the world?
One thing became clear over time: the biggest bottleneck isnโt compute or rendering โ itโs 3D worlds themselves. High-quality, consistent environments are expensive to create by hand and donโt scale to the experiences we want to build. In particular, I believe that the ability to generate virtual worlds is ultimately key towards understanding the real world.
Thatโs why we founded SpAItial AI. Weโre building spatial world models that combine geometric understanding with creative generation โ models that can generate, edit, and eventually reason about 3D environments.
Echo is just the beginning. For me, this feels like the moment when decades of research finally meet the imagination that got many of us into graphics, games, 3D understanding in the first place.๐
https://t.co/L0RxOUqFMa
Radiance Meshes for Volumetric Reconstruction
๐ We made a thing!
A Radiance Field that is composited out of a triangle meshes that renders accurately the underlying volumetric field, with no popping AND faster than approximate methods like Gaussian Splatting.
๐ข ๐๐ฃ๐ฉ๐ง๐๐ฃ๐จ๐๐ @NeurIPSConf
If you are excited about PBR materials, drop by at Wednesday from 4:30pm to 7:30pm (poster 4308), or feel free to dm me.
PS: We have just released the training code, so you can also train your own models now!
https://t.co/se8WvQYPCb
๐ข IntrinsiX: High-Quality PBR Generation using Image Priors ๐ข
From text input, we generate renderable PBR maps! Next to editable image generation, our predictions can be distilled into room-scale scenes using SDS for large-scale PBR texture generation.
We first train separate LoRA modules for the intrinsic properties of albedo, rough/metal, normal. Then, we introduce cross-intrinsic attention using a rerendering loss with importance-weighted light sampling to enable coherent PBR generation.
Our method outperforms text -> image -> PBR methods both in generalization and quality, since directly generating PBR maps does not suffer from the inherent ambiguity of intrinsic image decomposition. In addition, our design choice facilitates SDS-based PBR texture distillation.
๐ https://t.co/fxu5zjAsyJ
๐ฅ https://t.co/A12W65ijSl
Great work by @Peter4AI, @LukasHollein
Congrats to @yawarnihal for winning the @MdsiTum best paper award for his amazing ๐๐๐ฌ๐ก๐๐๐ work๐
MeshGPT autoregressively generates compact, artist-style triangle meshes by tokenizing faces into a learned discrete vocabulary (VQ-style codebook) and training a decoder-only transformer to predict those face tokens โ because discrete tokenization + attention lets GPT-style models learn long-range geometric & topological patterns and produce coherent, high-fidelity 3D assets.
MeshGPT's use cases go far beyond traditional content creation applications in computer graphics. For instance, the method was developed in collaboration with @Audi to help rapid prototyping of car designs, where explicit and precise mesh design are essential.
In the research community, there have already been many follow ups such as MeshAnything, MeshXL, Meshtron, and many more - finally, we can use AI to generate high-fidelity 3D content :)
Project: https://t.co/N8MLltTuKK
Video: https://t.co/lzJeGhTaEA
๐ขProcGen3D: Learning Neural Procedural Graphs for Image-to-3D Reconstruction
@xinyi092298 learns neural procedural graphs to generate high-fidelity 3D - MCTS-guided sampling maintains consistency with the input image, even from real images!
Check it out: https://t.co/RLGd2iXCwf
๐ข๐ข ๐๐๐ซ๐๐๐๐๐: ๐๐๐ซ๐๐๐ฉ๐ญ๐ฎ๐๐ฅ ๐๐๐๐ ๐๐จ๐๐๐ฅ ๐๐จ๐ซ ๐๐ข๐ง๐ ๐ฅ๐-๐๐ฆ๐๐ ๐ ๐๐ ๐๐๐๐ ๐๐๐๐จ๐ง๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ข๐จ๐ง & ๐๐๐ข๐ญ๐ข๐ง๐ ๐ข๐ข
PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text.
At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS.
Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output.
In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline.
๐ https://t.co/5yZcJzsoXz
๐ฝ๏ธ https://t.co/ntb5kRi3rY
Great work by @antonio_oroz and @TobiasKirschst1
๐ขNew in ScanNet++: High-Res 360ยฐ Panos!
@chandan__yes & @liuyuehcheng have added pano captures for 956 ScanNet++ scenes, fully aligned with the 3D meshes, DSLR, and iPhone data - multiple panos per scene
Check it out:
Docs https://t.co/gu3bVQZEZy
Code https://t.co/BlHgQuaaSk
๐ข Code for ๐๐ฃ๐ฉ๐ง๐๐ฃ๐จ๐๐: High-Quality PBR Generation using Image Priors ๐ข
We have just released the inference code and the pre-trained weights! Our model generates intrinsic properties (albedo, roughnes, metallic, normal) directly from text.
https://t.co/se8WvQZnrJ
Fantastic retreat this weekend by our research groups!
Internal reviews, ideas brainstorming, paper reading, and much more! Of course also many social activities -- the highlight being our kayaking trip - lots of fun :)
So You Want to Be an Academic? A couple of years into your PhD, but wondering: "Am I doing this right?" Most of the advice is aimed at graduating students. But there's far less for junior folks who are still finding their academic path.
My candid takes: https://t.co/25JdxHAON0
๐Our paper, ๐๐ป๐๐ฟ๐ถ๐ป๐๐ถ๐ซ, just got accepted to #NeurIPS2025! ๐
Our model generates renderable PBR maps directly from text and can also be used for rooms-scale scene texturing using SDS.
https://t.co/se8WvQYPCb
Let's meet in San Diego!