🚀🚀 Introducing Pixal3D (SIGGRAPH’26) — a new pixel-aligned image-to-3D generation paradigm for high-fidelity 3D asset creation.
Today’s Image-to-3D has become pretty good at producing plausible 3D assets. But plausibility is not enough. Fidelity is a hidden bottleneck.
❓A generated model may look “about right,” yet still fail to truly align with the input pixels. Can we make 3D generation as faithful as reconstruction, while still allowing it to complete the unseen?
Pixal3D is our answer.
💡We believe the core bottleneck behind fidelity is 2D–3D correspondence. Most 3D-native generators synthesize shapes in canonical space and inject image cues through cross-attention, forcing the model to implicitly search for which pixels correspond to which 3D regions.
🍀Pixal3D takes a different route. Instead of generating in canonical space, Pixal3D generates directly in pixel-aligned camera space — what you see is what you get. The generated 3D asset is aligned with the input view from the start.
☕️Meanwhile, Pixal3D introduces back-projection-based image condition scheme - explicitly back-projects multi-scale pixel features into 3D voxels, thus resolving the 2D-3D association problem. The input image is no longer just a prompt - it becomes a geometric anchor.
🚩Pixal3D shows that pixel-aligned 3D generation is not only feasible and scalable, but also significantly improves fidelity, pushing 3D-native generation closer to reconstruction-level faithfulness. It also naturally extends to multi-view and scene-level 3D generation.
✅Faithful to the input view. ✅Generative for the unseen.
Closer to reconstruction-level fidelity, with the creativity of 3D generation. Pixal3D also represents an effort towards the unification of 3D generation and reconstruction.
📢Paper, code, and demo are fully released — try it out and let us know your feedback!
🌐Project page: https://t.co/Y1oKzZZrkZ
🤗Huggingface Demo:
https://t.co/4QoDdHMOsk
💻Code:
https://t.co/xwkNNQTMha
📄Paper:
https://t.co/UgiNH00PEY
🚀 Introducing CoMoVi! From a start image & text prompt, it simultaneously generates realistic human videos and corresponding 3D motion sequences.
✨ No reference videos needed to extract skeletons anymore!
🧠 By co-generating motion and video, CoMoVi directly inherits the massive generalization power of video gen models, making it adaptable to various diverse text prompts!
🌍 This co-generation approach also makes CoMoVi look like a human-centric World Action Model (WAM), simulating not just the visual world, but the physical state of human actions within it.
arxiv: https://t.co/bpVyAirgls
HF page: https://t.co/W0kdOwwyKF
Project page: https://t.co/371GDgJKVo
Code: https://t.co/qSJKb5vaqG
🎥 Demo Video for MotionCrafter (CVPR 2026)
How much do video diffusion models know about the 4D world?
Watch the demo to find the answer👇
https://t.co/hnt51lrxTh
#CVPR2026#ComputerVision#3DVision
🎥 Demo Video for MotionCrafter (CVPR 2026)
How much do video diffusion models know about the 4D world?
Watch the demo to find the answer👇
https://t.co/hnt51lrxTh
#CVPR2026#ComputerVision#3DVision
What a crazy week in AI! 🚀
LTX 2.3
GPT 5.4
FireRed Edit 1.1
Kiwi Edit
HY WU
Qwen 3.5 small
Cuda Agent
CubeComposer
Helios
Spatial T2I
Spectrum
Utonia
& more!
Watch the full recap:
https://t.co/iH01KoagIH
"Track4World: Feedforward World‑centric Dense 3D Tracking of All Pixels"
TL;DR: feed‑forward model that predicts pixel‑level 2D and 3D dense flows for holistic world‑centric 3D tracking from monocular video, outperforming prior flow and tracking baselines.
Excited to share Track4World, feedforward 3D tracking of all pixels in the world-centric coordinate system. Code has been released, and welcome to try it!
Homepage: https://t.co/OIiaEl8KJP
Code: https://t.co/VgLAkCLPCZ
Paper: https://t.co/bCPEVFaQUW
Excited to share Track4World, feedforward 3D tracking of all pixels in the world-centric coordinate system. Code has been released, and welcome to try it!
Homepage: https://t.co/OIiaEl8KJP
Code: https://t.co/VgLAkCLPCZ
Paper: https://t.co/bCPEVFaQUW
How much do video diffusion models know about the 4D world? By introducing a 4D VAE, we jointly estimate geometry and motion from videos using a large-scale pretrained VDM.
- paper: https://t.co/pOqK39MO10
- page: https://t.co/nfDqzlvDV5
- code: https://t.co/QMgcfht363
Track4World.
Feedforward world-centric dense 3D tracking;
- tracks every pixel in 3D.
- 16-frame sequences in 3.4s with 14GB VRAM;
- Depth Anything v3 as backbone.
https://t.co/5PTAemjBwj
Track4World: what if you could track every single pixel's 3D movement in a video, accurately and instantly? this new model turns any regular video into a detailed 3D scene, figuring out teh precise 3D path of everything moving in the frame, fast. it's like rebuilding the entire world from a single clip! 🤯 code and demo are available.
CoMoVi, a co-generative framework that couples two video diffusion models (VDMs) to generate 3D human motions and videos synchronously within a single diffusion denoising loop.
the generation of 3D human motions and 2D human videos is intrinsically coupled. 3D motions provide the structural prior for plausibility and consistency in videos, while pre-trained video models offer strong generalization capabilities for motions, which necessitate coupling their generation processes. CoMoVi is based on this.
Paper Title: CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos
Project: https://t.co/Q5fwWcRjkP
Link: https://t.co/VGgztlYHPA
Excited to share our recent work, UniSH, which unifies dynamic 3D scene reconstruction and SMPL estimation within a single framework. (Left-top is input video).
Code has been released! https://t.co/T2CrYpTZxn
Project page: https://t.co/4OZq3QW9Th
Paper: https://t.co/Pnebo5LuDY
🚀🚀We’re building a new Applied Research Team in Tencent IEG for Game AI, with a research culture similar to ARC Lab.
This newly formed team focuses on research-driven Game AI, operating at the intersection of fundamental research and large-scale game environments. Our goal is to develop principled models that can understand, simulate, and act within complex virtual worlds—while remaining grounded enough to eventually shape real games.
Our research directions include (but are not limited to):
🎮 Interactive & Dynamic World Modeling — learning, simulating, and reasoning about evolving game worlds
🤖 NPC World-to-Action Modeling — connecting world understanding to decision and action, with strong ties to Embodied AI and agent behavior
🌍 Game Scene Generation — generative modeling of diverse, controllable, and scalable game scenes
We are looking for researchers with the following minimum qualifications:
✨ A recent Ph.D. in related fields
✨ 5+ top conference or journal papers
✨ 1000+ GitHub stars
🌟 Evidence of a “make it work” mindset
We are also open to strong graduate students for intern positions. Feel free to DM me or contact: [email protected].
Happy to share our new work, MVInverse, a feedforward framework for multiview PBR material estimation at ~10 fps. Multiview ViT (like VGGT, Pi3) can also do material estimation!
Paper: https://t.co/qmFPcctRTt
Homepage: https://t.co/wPvhSbWmJQ
Code: https://t.co/EiHGX0t7uG
People are underestimating @Apple in AI.
I just ran Apple’s new SHARP model locally and watched my photos turn into 3D Gaussian splats in seconds, then stepped inside them on Vision Pro.
This feels like the beginning of something special. You really have to try it.