Xiao-Xiao Long @xxlong0 - Twitter Profile

2 months ago

Video2Humanoid. Still trouble with bad retargeted humanoid motions? Humanoids now are Easy to track any video using our new Neural Motion Retargeting (NMR) Method. Optimization-based motion retargeting methods like IK and GMR are solving a non-convex problem frame by frame, which makes them sensitive to initialization, hard to tune, and prone to poor local minima. The result is familiar: joint discontinuities, self-collisions, and unstable foot contact. Our solution is simple but effective : instead of optimizing each frame independently, learn the mapping between human motion and robot motion as a distribution via networks. 📊Results on Unitree G1 - 0 joint jumps - 54% fewer self-collision frames - 61% fewer joint limit violations - Faster convergence for downstream control policy training NMR turns motion retargeting from a fragile optimization problem into a learned, scalable pipeline for more stable humanoid motion. please visit https://t.co/mtyhgnT56Q for more visualization results. opensource: https://t.co/H01K7mFMQT

6

263

53

197

24K

xxlong0 retweeted

Xun Huang

@xxunhuang

10 months ago

Tencent presents Yan: Foundational Interactive Video Generation. It has been only two months since our release of Self-Forcing, and there are already two world foundation models built on top of it. Chinese teams are building at the speed of light! https://t.co/HkmxPiiDvb

7

697

115

342

55K

xxlong0 retweeted

Hao-Xiang (Howard) Guo @guohaoxiang4

10 months ago

📣Matrix-3D: Omnidirectional Explorable 3D World Generation📣 Matrix-3D is a 3D world model which generates large-scale explorable 3D scenes from a single image or text prompt, with SOTA performance. - Project: https://t.co/Be94fgBtEs - Code: https://t.co/fdNNyS6jX6

3

260

52

173

20K

Xiao-Xiao Long @xxlong0

11 months ago

Excited to share our PDT: Point Distribution Transformation with Diffusion Models (SIGGRAPH2025)!

Cheng Lin @_cheng_lin

11 months ago

🔎Excited to share our PDT: Point Distribution Transformation with Diffusion Models (SIGGRAPH2025)! 💡While autoregressive models are now widely adopted for predicting structures, we find diffusion models can also reveal high-level structures by transforming point distributions!

6

413

60

238

34K

0

4

0

681

xxlong0 retweeted

AK

@_akhaliq

11 months ago

Epona Autoregressive Diffusion World Model for Autonomous Driving

3

155

19

78

19K

Xiao-Xiao Long @xxlong0

12 months ago

@ChuanxiaZ @Oxford_VGG @NTUsg Congratulations

1

0

188

Xiao-Xiao Long @xxlong0

over 1 year ago

Special submission experience. A co-authored ICLR submitted paper was rejected with all positve final reviews. Reviewers agreed main concernings were solved and raised scores after rebuttal, but AC still rejected it.

xxlong0's tweet photo. Special submission experience.
A co-authored ICLR submitted paper was rejected with all positve final reviews. Reviewers agreed main concernings were solved and raised scores after rebuttal, but AC still rejected it.

3

7

2

0

2K

Xiao-Xiao Long @xxlong0

over 1 year ago

Excited to share our work DrivingWorld, a GPT-style Autoregressive (AR) video world model for autonomous driving. Rather than simply employ the next-token-prediction strategy, we leverage next-frame-prediction to model temporally coherent frame-wise information and then use next-token-prediction to interpolate the content of each frame. As a result, our model achieves accurate future prediction (up to 40 seconds) based on the inputted vehicle's locations (left subfig of the video) and a few conditioned images (non-red contourred frames at the beginning). Codes and more demos can be found at https://t.co/cnvMVoXi2M

3

125

18

52

10K

Xiao-Xiao Long @xxlong0

over 1 year ago

Cool Work. Use 3D tracking points to control video generation.

AK

@_akhaliq

over 1 year ago

Diffusion as Shader 3D-aware Video Diffusion for Versatile Video Generation Control

5

291

51

143

41K

0

60

5

29

12K

Xiao-Xiao Long @xxlong0

over 1 year ago

Cool work!

MrNeRF

@janusch_patas

over 1 year ago

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Contributions: • We propose a novel scene representation for accurately modeling complex near-field and high-frequency reflections in real-world environments. • We developed a real-time ray-tracing renderer for 2DGS, enabling joint optimization of our representation for accurate scene reconstruction while achieving real-time rendering speeds. • Extensive experiments show that EnvGS significantly outperforms previous methods. To the best of our knowledge, EnvGS is the first method to achieve real-time photorealistic specular reflections synthesis in real-world scenes.

2

332

47

204

45K

0

18

3

2

2K

Xiao-Xiao Long @xxlong0

over 1 year ago

Very smart idea! Congrats to @Lyxun2000 . Rather than heavy 3D triplane transformer arch, a pure 2D diffusion-based model for generative gaussian splatting. High-quality and efficient!

MrNeRF

@janusch_patas

over 1 year ago

LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images Contributions: • We train a network to map images to a novel Relative Coordinate Map (RCM), which embeds pixel-wise correspondences across different input views to a main view and can be converted to its point cloud representation. • We demonstrate that RCMs can be easily obtained by fine-tuning a pre-trained 2D network to capitalize on existing 2D foundation models. • We showcase the superior quality of our flexible method, enabling rapid 3D generation from mobile phone image captures within seconds.

2

108

17

73

9K

0

14

0

1

1K

xxlong0 retweeted

MrNeRF

@janusch_patas

over 1 year ago

LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images Contributions: • We train a network to map images to a novel Relative Coordinate Map (RCM), which embeds pixel-wise correspondences across different input views to a main view and can be converted to its point cloud representation. • We demonstrate that RCMs can be easily obtained by fine-tuning a pre-trained 2D network to capitalize on existing 2D foundation models. • We showcase the superior quality of our flexible method, enabling rapid 3D generation from mobile phone image captures within seconds.

2

108

17

73

9K

xxlong0 retweeted

Zhengming Yu @Proof_Yu

over 1 year ago

Surf-D will be posted today. Please check out with our handsome boy @frankzydou if you're at ECCV 2024 @eccvconf!! 👗 Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models 📅 Wed, Oct 2 | 16:30 - 18:30 | Poster Session 4 | Poster 285

0

22

5

2

3K

xxlong0 retweeted

AK

@_akhaliq

over 1 year ago

Lotus Diffusion-based Visual Foundation Model for High-quality Dense Prediction Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising solution to enhance zero-shot generalization in dense prediction tasks. However, existing methods often uncritically use the original diffusion formulation, which may not be optimal due to the fundamental differences between dense prediction and image generation. In this paper, we provide a systemic analysis of the diffusion formulation for the dense prediction, focusing on both quality and efficiency. And we find that the original parameterization type for image generation, which learns to predict noise, is harmful for dense prediction; the multi-step noising/denoising diffusion process is also unnecessary and challenging to optimize. Based on these insights, we introduce Lotus, a diffusion-based visual foundation model with a simple yet effective adaptation protocol for dense prediction. Specifically, Lotus is trained to directly predict annotations instead of noise, thereby avoiding harmful variance. We also reformulate the diffusion process into a single-step procedure, simplifying optimization and significantly boosting inference speed. Additionally, we introduce a novel tuning strategy called detail preserver, which achieves more accurate and fine-grained predictions. Without scaling up the training data or model capacity, Lotus achieves SoTA performance in zero-shot depth and normal estimation across various datasets. It also significantly enhances efficiency, being hundreds of times faster than most existing diffusion-based methods.

_akhaliq's tweet photo. Lotus

Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising solution to enhance zero-shot generalization in dense prediction tasks. However, existing methods often uncritically use the original diffusion formulation, which may not be optimal due to the fundamental differences between dense prediction and image generation. In this paper, we provide a systemic analysis of the diffusion formulation for the dense prediction, focusing on both quality and efficiency. And we find that the original parameterization type for image generation, which learns to predict noise, is harmful for dense prediction; the multi-step noising/denoising diffusion process is also unnecessary and challenging to optimize. Based on these insights, we introduce Lotus, a diffusion-based visual foundation model with a simple yet effective adaptation protocol for dense prediction. Specifically, Lotus is trained to directly predict annotations instead of noise, thereby avoiding harmful variance. We also reformulate the diffusion process into a single-step procedure, simplifying optimization and significantly boosting inference speed. Additionally, we introduce a novel tuning strategy called detail preserver, which achieves more accurate and fine-grained predictions. Without scaling up the training data or model capacity, Lotus achieves SoTA performance in zero-shot depth and normal estimation across various datasets. It also significantly enhances efficiency, being hundreds of times faster than most existing diffusion-based methods.

8

443

65

280

101K

xxlong0 retweeted

Zhiyang (Frank) Dou

@frankzydou

over 1 year ago · Milan

If you're at ECCV 2024 @eccvconf, please check out our recent works on shape and motion generation. 3D Shape Generation with Arbitrary Topologies 👗 Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models 📅 Wed, Oct 2 | 16:30 - 18:30 | Poster Session 4 | Poster 285 paper: https://t.co/ZxQeBtFerM Efficient Human Motion Generation 🏃🏻‍♀️ EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation 📅 Wed, Oct 2 | 10:30 - 12:30 | Poster Session 3 | Poster 227 paper: https://t.co/u1s71rlXRb Controllable Human Motion Synthesis 🤸🏼‍♂️ TLControl: Trajectory and Language Control for Human Motion Synthesis 📅 Tue, Oct 1 | 16:30 - 18:30 | Poster Session 2 | Poster 273 paper: https://t.co/HYE5gZt53h Avatar Generation 💃 Disentangled Clothed Avatar Generation from Text Descriptions 📅 Fri, Oct 4 | 10:30 - 12:30 | Poster Session 7 | Poster 301 paper: https://t.co/lVzu4ZCfHZ - Please feel free to ask any questions. I am excited to explore and discuss all the interesting ideas during the poster session. I will be attending the Doctoral Consortium on Oct 1 (12:00 - 14:00). Looking forward to it! 🗺️Dynamic Realms: 4D Content Analysis, Recovery and Generation with Geometric📐, Topological🔗and Physical Priors⚛️ #ECCV2024 #Milan #Milano #ECCV #Vision #Graphics #GENAI #generativeai

0

85

15

50

8K

Xiao-Xiao Long @xxlong0

over 1 year ago

Congrats to Linhan for his first NeurIPS paper. DC-Gaussian could reconstruct high-quality 3D Gaussain Splatting via shitty cameras shooting through windshield (car dash cameras)😀. Codes have been released. See project page here: https://t.co/F6DMTOAg6q

Xiao-Xiao Long @xxlong0

about 2 years ago

Thanks for the tweet. We improve 3D Gaussain splatting for dash cam videos that contain serious reflections and obstructions. With Dash cam videos, it's possible to leverage massive driving data. Visit our page at https://t.co/F6DMTOzIgS Codes will be soon.

2

129

18

54

44K

1

93

19

30

11K

Xiao-Xiao Long @xxlong0

almost 2 years ago

Our ECCV paper GeoWizard could produce high-quality depth and normal, yielding good 3D shapes. Check out our page: https://t.co/GVhl8VPSSB

1

154

25

70

21K

Xiao-Xiao Long @xxlong0

almost 2 years ago

Very excited to share that GeoWizard is just accepted to ECCV 2024. Very appreaciate the efforts from every author, especially for Xiao Fu !

Xiao-Xiao Long @xxlong0

about 2 years ago

An image + a detailed normal map generated by GeoWizard --> a relightable image. Geowizard is a generative geometry estimation model that could jointly produce high-quality depth and normal with intricate details. Try our huggingface demo here: https://t.co/cNVburSygp #huggingface #gradio #geowizard

0

275

48

214

30K

1

96

12

29

10K

xxlong0 retweeted

Chi Zhang @dr_chizhang

almost 2 years ago

🔥Exciting update to the HuggingFace demo of Metric3D, the most powerful monocular metric depth predictor😱! New feature: directly view reconstructed 3D scenes alongside depth maps on the web page. 😎Check out these enhance at https://t.co/WB9xXvkIqr

dr_chizhang's tweet photo. 🔥Exciting update to the HuggingFace demo of Metric3D, the most powerful monocular metric depth predictor😱! New feature: directly view reconstructed 3D scenes alongside depth maps on the web page. 😎Check out these enhance at https://t.co/WB9xXvkIqr https://t.co/6hppzZafIO

3

94

18

56

7K

xxlong0 retweeted

Mr. For Example

@MrForExample

almost 2 years ago

#ComfyUI #Comfy3D Comfy3D Update: - Integrated Unique3D all stages workflow - Support Era3D + Unique3D workflow - All workflow breakdown to its core functions Project GitHub page: https://t.co/oGS7BkdeYE

3

181

28

133

17K

Xiao-Xiao Long

@xxlong0

Last Seen Users on Sotwe

Trends for you

Most Popular Users