Phuc Nguyen Duc Anh @phucnda - Twitter Profile

Pinned Tweet

about 1 month ago

We are excited to release the code for our paper OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness, accepted to #CVPR2026. Source Code: https://t.co/na6fLrmzE8 From a dashcam video, OpenVO estimates how the camera of the vehicle moves in metric scale.

3

381

47

323

21K

phucnda retweeted

Oier Mees @oier_mees

11 days ago

A few years ago, learning robot learning meant stitching together dozens of papers and courses — with no clear path from the basics to what state-of-the-art systems actually do. This was one of the motivations behind creating @ETH's course "Robot Learning: From Fundamentals to Foundation Models", to provide a structured path from first principles all the way to modern foundation models for robotics. I strongly believe that education should be accessible to everyone, so I have made all lecture recordings publicly available on YouTube. Creating this course was one of the most challenging projects I have taken on. It was my first time designing and teaching an entire curriculum from scratch, while simultaneously working full-time in industry. On top of that, the course proved to be more popular than expected and we had to scale it to almost 300 students, which was only possible thanks to an amazing team of TAs. Looking back, it was an absolute privilege to teach this class and an incredibly rewarding experience. If you are getting into robot learning, this is the starting point I wish I had. 📚 Main lectures: https://t.co/r1PpQASaJg 🎤 Guest lectures: https://t.co/nh5Rm2P2Lz 🌐 Course website: https://t.co/DoQUYy3MjB

oier_mees's tweet photo. A few years ago, learning robot learning meant stitching together dozens of papers and courses — with no clear path from the basics to what state-of-the-art systems actually do.
This was one of the motivations behind creating @ETH's course "Robot Learning: From Fundamentals to Foundation Models", to provide a structured path from first principles all the way to modern foundation models for robotics.
I strongly believe that education should be accessible to everyone, so I have made all lecture recordings publicly available on YouTube.
Creating this course was one of the most challenging projects I have taken on. It was my first time designing and teaching an entire curriculum from scratch, while simultaneously working full-time in industry. On top of that, the course proved to be more popular than expected and we had to scale it to almost 300 students, which was only possible thanks to an amazing team of TAs. Looking back, it was an absolute privilege to teach this class and an incredibly rewarding experience.
If you are getting into robot learning, this is the starting point I wish I had.
📚 Main lectures:
https://t.co/r1PpQASaJg
🎤 Guest lectures:
https://t.co/nh5Rm2P2Lz
🌐 Course website: https://t.co/DoQUYy3MjB

16

468

63

456

37K

phucnda retweeted

Jianyuan

@jianyuan_wang

about 1 month ago

Introducing VGGT-Ω: scaling feed-forward reconstruction across static and dynamic scenes, and studying whether the learned geometric representations transfer beyond reconstruction.

14

816

141

244

778K

Phuc Nguyen Duc Anh @phucnda

about 1 month ago

We propose a novel temporal-dynamics-informed, camera-aware and geometry-aware visual odometry system. Our method takes consecutive dashcam frames as input and extracts both temporal and geometric representations for robust egomotion estimation. https://t.co/n6ohBEcazj

phucnda's tweet photo. We propose a novel temporal-dynamics-informed, camera-aware and geometry-aware visual odometry system. Our method takes consecutive dashcam frames as input and extracts both temporal and geometric representations for robust egomotion estimation.
https://t.co/n6ohBEcazj https://t.co/Uwd7rBjckF

0

9

0

6

620

Phuc Nguyen Duc Anh @phucnda

about 1 month ago

We are excited to release the code for our paper OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness, accepted to #CVPR2026. Source Code: https://t.co/na6fLrmzE8 From a dashcam video, OpenVO estimates how the camera of the vehicle moves in metric scale.

3

381

47

323

21K

phucnda retweeted

Wang Zhao

@WangZhao_0849

about 1 month ago

🚀🚀 Introducing Pixal3D (SIGGRAPH’26) — a new pixel-aligned image-to-3D generation paradigm for high-fidelity 3D asset creation. Today’s Image-to-3D has become pretty good at producing plausible 3D assets. But plausibility is not enough. Fidelity is a hidden bottleneck. ❓A generated model may look “about right,” yet still fail to truly align with the input pixels. Can we make 3D generation as faithful as reconstruction, while still allowing it to complete the unseen? Pixal3D is our answer. 💡We believe the core bottleneck behind fidelity is 2D–3D correspondence. Most 3D-native generators synthesize shapes in canonical space and inject image cues through cross-attention, forcing the model to implicitly search for which pixels correspond to which 3D regions. 🍀Pixal3D takes a different route. Instead of generating in canonical space, Pixal3D generates directly in pixel-aligned camera space — what you see is what you get. The generated 3D asset is aligned with the input view from the start. ☕️Meanwhile, Pixal3D introduces back-projection-based image condition scheme - explicitly back-projects multi-scale pixel features into 3D voxels, thus resolving the 2D-3D association problem. The input image is no longer just a prompt - it becomes a geometric anchor. 🚩Pixal3D shows that pixel-aligned 3D generation is not only feasible and scalable, but also significantly improves fidelity, pushing 3D-native generation closer to reconstruction-level faithfulness. It also naturally extends to multi-view and scene-level 3D generation. ✅Faithful to the input view. ✅Generative for the unseen. Closer to reconstruction-level fidelity, with the creativity of 3D generation. Pixal3D also represents an effort towards the unification of 3D generation and reconstruction. 📢Paper, code, and demo are fully released — try it out and let us know your feedback! 🌐Project page: https://t.co/Y1oKzZZrkZ 🤗Huggingface Demo: https://t.co/4QoDdHMOsk 💻Code: https://t.co/xwkNNQTMha 📄Paper: https://t.co/UgiNH00PEY

28

1K

150

1K

201K

phucnda retweeted

Ryohei Sasaki@engineer

@rsasaki0109

about 2 months ago

NAS3R [CVPR 2026] From None to All: Self-Supervised 3D Reconstruction via Novel View Synthesis NAS3R is a self-supervised feed-forward framework that jointly learns explicit 3D geometry and camera parameters with no ground-truth annotations and no pretrained priors.

rsasaki0109's tweet photo. NAS3R
[CVPR 2026] From None to All: Self-Supervised 3D Reconstruction via Novel View Synthesis

NAS3R is a self-supervised feed-forward framework that jointly learns explicit 3D geometry and camera parameters with no ground-truth annotations and no pretrained priors. https://t.co/aZjlqKFOhQ

2

133

26

109

9K

phucnda retweeted

Wildminder

@wildmindai

about 2 months ago

RecGen. turns sparse photos into complete, simulation-ready 3D scenes. You feed it one RGB-D snapshot of a cluttered environment, and get 3D shapes, textures, and precise poses for every object. - reconstructs even heavily occluded or symmetric objects. - built on TRELLIS-image-large. https://t.co/67UclGQ402

2

110

20

103

9K

phucnda retweeted

Chris Choy, Ph.D. @realChrisChoy

about 2 months ago

Watch your 3D segmentation model learn live! I just added a viser visualizer to the WarpConvNet ScanNet example: three side-by-side panels (input RGB / ground truth / prediction) refresh every few seconds during training. https://t.co/HtsmNKFqIY

realChrisChoy's tweet photo. Watch your 3D segmentation model learn live!

I just added a viser visualizer to the WarpConvNet ScanNet example: three side-by-side panels (input RGB / ground truth / prediction) refresh every few seconds during training. https://t.co/HtsmNKFqIY https://t.co/EDhw3sHoSO

1

88

8

68

5K

phucnda retweeted

Sadao Tokuyama

@tokufxug

about 2 months ago

単眼カメラで15FPSのリアルタイム高密度SLAMを実現する「MASt3R-SLAM」。 MASt3Rの3D再構成知見を活用し、未知の動画でも幾何学的な整合性を維持。キャリブレーション次第でSOTAを記録し、動的なカメラモデルにも対応する、ロボティクス向けの実用的な単眼SLAMの基盤。

6

324

31

258

18K

phucnda retweeted

Siyuan Huang

@siyuanhuang95

about 2 months ago

Scaling 3D scene data is a long-standing challenge in scene understanding, spatial reasoning, and robotics. Since scanning, reconstruction, and labeling are so labor-intensive, data scarcity has remained a major bottleneck. 🛑 To solve this, we propose SceneVerse++: Lifting Unlabeled Internet-level Data for 3D Scene Understanding (CVPR 2026). By reconstructing internet videos and annotating 3D scenes automatically, we’ve created a massive real-world dataset for end-to-end understanding. 🌐📐 SceneVerse++ makes it easy to scale "in-the-wild" 3D scenes toward more capable spatial reasoning systems. This significantly promotes progress in 3D VQA, visual navigation, and broader tasks in Embodied AI and Robotics. 🤖🦾 We are fully open-sourced! Check out the paper, code, and data here: 🌐 Project: https://t.co/d5qXI7G6WL 📄 Paper: https://t.co/46OIUZBSDO 📊 Dataset: https://t.co/2td6Yla3ON💻 Code: https://t.co/u9QyQWHy0Z

1

163

26

148

13K

phucnda retweeted

DailyPapers

@HuggingPapers

about 2 months ago

Meta just released Sapiens2 on Hugging Face High-resolution vision transformers pretrained on 1 billion human images, for human-centric perception: pose, segmentation, normals, and pointmaps.

2

249

31

204

26K

phucnda retweeted

Kangxue Yin @kangxue_yin

2 months ago

🚀We just released Asset Harvester, an image-to-3D model and end-to-end pipeline that extracts real object assets from autonomous driving videos! 🌐 Website: https://t.co/vXnFVW1ui8 💻 Code: https://t.co/3q3vcRvojy [1/5] #AssetHarvester #AVSimulation #WorldModel #AutonomousDriving

30

792

130

583

107K

phucnda retweeted

Ryohei Sasaki@engineer

@rsasaki0109

2 months ago

PyCuSFM: Cuda Accelerated Structure from Motion This repository provides the official python implementation of cuSFM, a novel CUDA-accelerated Structure-from-Motion framework for reconstructing 3D environmental models from images. Key features include: -CUDA-accelerated feature extraction, matching, and graph optimization for superior speed and scalability -Precise and robust camera pose estimation -Accurate and consistent 3D environment reconstruction with COLMAP-compatible outputs -Support for any number and type of camera inputs -Reliable extrinsic calibration for multi-camera setups -Localization mode for integrating new data into pre-built map

rsasaki0109's tweet photo. PyCuSFM: Cuda Accelerated Structure from Motion

This repository provides the official python implementation of cuSFM, a novel CUDA-accelerated Structure-from-Motion framework for reconstructing 3D environmental models from images. Key features include:

-CUDA-accelerated feature extraction, matching, and graph optimization for superior speed and scalability
-Precise and robust camera pose estimation
-Accurate and consistent 3D environment reconstruction with COLMAP-compatible outputs
-Support for any number and type of camera inputs
-Reliable extrinsic calibration for multi-camera setups
-Localization mode for integrating new data into pre-built map

2

182

22

141

9K

phucnda retweeted

Emmy Liu @_emliu

2 months ago

wrote a guide on getting compute grants as a student, something I wish I did more at the beginning of my PhD. It's honestly one of the highest ROI things you can do as a student (we've gotten 100k+ gpu hrs for roughly 2 weeks of work writing). https://t.co/U15nwau88a

16

1K

200

2K

238K

phucnda retweeted

Tencent Hy

@TencentHunyuan

2 months ago

We’re open-sourcing HY-World 2.0, a multimodal world model that generates, reconstructs, and simulates interactive *3D worlds* from text, images, and videos. Outputs can be integrated into game engines and embodied simulation pipelines. Key highlights: 🔹 One-click world generation Turn text or image into interactive 3D worlds automatically. 🔹 Pipeline-ready 3D outputs Editable 3D worlds for Unity and Unreal Engine, with standard 3D exports including mesh, 3DGS, and point clouds. 🔹 Unified world model system One model family for world generation and reconstruction across synthetic and real-world scenes. 🔹 Interactive character mode Explore generated 3D worlds in real time with physics-aware movement and collision support. ✨ Apply for access: https://t.co/swscD5KGu2 🔗 GitHub: https://t.co/XpUKjBtK5n 🤗 Hugging Face: https://t.co/tv8hOPYABj 📄 Technical Report: https://t.co/s6WGMyw0L7

72

2K

384

2K

369K

phucnda retweeted

MrNeRF

@janusch_patas

2 months ago

Geometric Context Transformer for Streaming 3D Reconstruction Contributions: • We introduce LingBot-Map, a streaming 3D foundation model built around Geometric Context Attention (GCA), which maintains three complementary context types – anchor, pose-reference window, and trajectory memory – for efficient and consistent long-sequence streaming inference. • We propose an efficient training recipe based on progressive training and context parallelism with a relative loss formulation for stable long-sequence optimization. • We demonstrate that LingBot-Map achieves state-of-the-art performance on multiple benchmarks (Oxford Spires, Tanks and Temples, ETH3D, and 7-Scenes), significantly outperforming existing streaming approaches in reconstruction quality and inference speed.

6

443

57

313

25K

phucnda retweeted

Zhenjun Zhao @zhenjun_zhao

2 months ago

VGGT-SLAM++ Avilasha Mandal, Rajesh Kumar, @sudarshan_s_h, Chetan Arora tl;dr: DEM->submap; DINOv2 embeddings->retrieval; covisibility graph synthesis https://t.co/kg5MI4MaY1

zhenjun_zhao's tweet photo. VGGT-SLAM++

Avilasha Mandal, Rajesh Kumar, @sudarshan_s_h, Chetan Arora

tl;dr: DEM->submap; DINOv2 embeddings->retrieval; covisibility graph synthesis

https://t.co/kg5MI4MaY1 https://t.co/DHKSDBBPZK

0

55

11

32

3K

phucnda retweeted

Zhenjun Zhao @zhenjun_zhao

2 months ago

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction Tao Xie, Peishan Yang, @krahets, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, @pengsida, @gingertata, @XiaoweiZhou5 tl;dr: neural global context with lightweight sub-networks and context aggregation https://t.co/Dfp0dJ1m1d

zhenjun_zhao's tweet photo. Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

Tao Xie, Peishan Yang, @krahets, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, @pengsida, @gingertata, @XiaoweiZhou5

tl;dr: neural global context with lightweight sub-networks and context aggregation

https://t.co/Dfp0dJ1m1d

0

52

10

33

3K

phucnda retweeted

Ai2 @allen_ai

3 months ago

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

9

282

62

164

86K

phucnda retweeted

Daniel DeTone @ddetone

2 months ago

Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: https://t.co/5IZ0tPlqvr Here we show Boxer in action on an egocentric sequence captured from smart glasses:

22

1K

166

950

80K

Phuc Nguyen Duc Anh

@phucnda

Last Seen Users on Sotwe

Trends for you

Most Popular Users