MarcoMando @Appanacca - Twitter Profile

about 2 months ago

Could this be the ViT moment for 3D scene understanding? 🚀 We revisit the good old Transformer architecture and apply it to 3D scene understanding with minimal modifications. #Volt ⚡️ Project page: https://t.co/1HYyTrtdak Arxiv link: https://t.co/imcuvM6j0X (1/5)

KadirYilmaz_CV's tweet photo. Could this be the ViT moment for 3D scene understanding? 🚀

We revisit the good old Transformer architecture and apply it to 3D scene understanding with minimal modifications. #Volt ⚡️

Project page: https://t.co/1HYyTrtdak
Arxiv link: https://t.co/imcuvM6j0X

(1/5) https://t.co/B04D3IHWKT

3

456

68

453

96K

Appanacca retweeted

Fangjinhua Wang @FangjinhuaWang

3 months ago

Excited to share our new survey on 3D Reconstruction, accepted to IEEE T-PAMI! We cover everything from depth estimation and NeRF to 3DGS and 3D foundation models. If you're interested in 3D reconstruction, you won't want to miss this.https://t.co/g5UEitbb1N #3D #ComputerVision

FangjinhuaWang's tweet photo. Excited to share our new survey on 3D Reconstruction, accepted to IEEE T-PAMI! We cover everything from depth estimation and NeRF to 3DGS and 3D foundation models. If you're interested in 3D reconstruction, you won't want to miss this.https://t.co/g5UEitbb1N

#3D #ComputerVision https://t.co/qql6CFeqB7

1

186

35

154

10K

Appanacca retweeted

Nicolas Sereyjol-Garros @sg_nicolas

5 months ago

📢 New preprint! We introduce R3DPA - a LiDAR scene generator that: • Transfers RGB-pretrained generative priors to 3D • Aligns with self-supervised LiDAR features • Enables object inpainting & scene mixing at inference • Sets SOTA on KITTI-360

2

178

36

91

12K

Appanacca retweeted

Sebastian Koch @sebastiank0ch

6 months ago

📣 UNITE: Unified Semantic Transformer for 3D Scene Understanding 📣 Given only a couple of images of a scene, UNITE jointly recovers: ✅ 3D Scene Geometry ✅ Semantic Segmentation ✅ Instance Segmentation ✅ Object Articulations 🚀 Project page: https://t.co/P7oln9sWjm

5

226

32

161

21K

Who to follow

DCartist - Paula Bannerman

@dcartist

Software Engineering instructor associate at @ga, #Art Advocate in STEM/STEAM, geeky/nerdy artist, and silly techie from the Washington #DC Metro Area #DMV.

Rubens🐊🐉🇧🇷🐏

@rubensstuginski

A verdade entrou pra dentro. Lá fora chove!

Orr_Sommerfeld

@OrrSommerfeld

Appanacca retweeted

Ryohei Sasaki@engineer

@rsasaki0109

6 months ago

[NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding https://t.co/heVdwCAs0s Existing LiDAR generative models are limited to producing unlabeled LiDAR scenes, lacking any semantic annotations. Performing post-hoc labeling on these generated scenes requires additional pretrained segmentation models, which introduces extra computational overhead. Moreover, such after-the-fact annotation yields suboptimal segmentation quality. To address this issue, we make the following contributions: ・We propose a novel state-of-the-art semantic-aware range-view LiDAR diffusion model, Spiral, which jointly produces depth and reflectance images along with semantic labels. ・We introduce unified evaluation metrics that comprehensively evaluate the geometric, physical, and semantic quality of generated labeled LiDAR scenes. ・We demonstrate the effectiveness of the generated LiDAR scenes for training segmentation models, highlighting Spiral's potential for generative data augmentation.

rsasaki0109's tweet photo. [NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
https://t.co/heVdwCAs0s
Existing LiDAR generative models are limited to producing unlabeled LiDAR scenes, lacking any semantic annotations. Performing post-hoc labeling on these generated scenes requires additional pretrained segmentation models, which introduces extra computational overhead. Moreover, such after-the-fact annotation yields suboptimal segmentation quality.

To address this issue, we make the following contributions:

・We propose a novel state-of-the-art semantic-aware range-view LiDAR diffusion model, Spiral, which jointly produces depth and reflectance images along with semantic labels.
・We introduce unified evaluation metrics that comprehensively evaluate the geometric, physical, and semantic quality of generated labeled LiDAR scenes.
・We demonstrate the effectiveness of the generated LiDAR scenes for training segmentation models, highlighting Spiral's potential for generative data augmentation.

2

87

11

55

6K

Appanacca retweeted

Dmytro Mishkin 🇺🇦 @ducha_aiki

7 months ago

RoMa v2: Harder Better Faster Denser Feature Matching @Parskatt et 11 al. tl;dr: in title. Predict covariance per-pixel, more datasets, use DINOv3, adjust architecture. https://t.co/jLia5dKmFv

ducha_aiki's tweet photo. RoMa v2: Harder Better Faster Denser Feature Matching
@Parskatt et 11 al.

tl;dr: in title.
Predict covariance per-pixel, more datasets, use DINOv3, adjust architecture.

https://t.co/jLia5dKmFv https://t.co/Ltda4jXZEa

4

131

19

75

7K

Appanacca retweeted

Dmytro Mishkin 🇺🇦 @ducha_aiki

7 months ago

New thread - adding DepthAnything 3 - both best and Apache 2 models. 1) Graffiti The best one is almost flat, smaller Apache2 model is very non-flat. 2)IMC-Church. Apache 2 pretty bad, big - not terrible, but not great either 1/

ducha_aiki's tweet photo. New thread - adding DepthAnything 3 - both best and Apache 2 models.
1) Graffiti
The best one is almost flat, smaller Apache2 model is very non-flat.
2)IMC-Church. Apache 2 pretty bad, big - not terrible, but not great either
1/ https://t.co/bGlaY249tn

4

104

10

81

16K

Appanacca retweeted

ℏεsam

@Hesamation

8 months ago

fantastic simple visualization of the self attention formula. this was one of the hardest things for me to deeply understand about LLMs. the formula seems easy. you can even memorize it fast. but to really get an intuition of what the Q,K,V represent and interact, that’s hard.

25

3K

542

3K

175K

Appanacca retweeted

Chris Offner

@chrisoffner3d

8 months ago

MapAnything's evil sibling also supports flexible inputs but triples down on redundant outputs, estimating point maps, depths, and 3D Gaussians. https://t.co/xGiuurUdXP

chrisoffner3d's tweet photo. MapAnything's evil sibling also supports flexible inputs but triples down on redundant outputs, estimating point maps, depths, and 3D Gaussians.
https://t.co/xGiuurUdXP https://t.co/chn1W9mXCM

2

162

18

114

14K

Appanacca retweeted

Chris Offner

@chrisoffner3d

8 months ago

Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.

chrisoffner3d's tweet photo. Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back. https://t.co/saPjkUgGp6

4

70

11

54

21K

Appanacca retweeted

Taha Yassine 🍉

@taha_yssne

8 months ago

Gentlemen I need your full attention. Python is introducing lazy imports. I repeat. Python is introducing lazy imports. inb4 the flood of `treewide: adopt lazy imports` +123,244 PRs

taha_yssne's tweet photo. Gentlemen I need your full attention.

Python is introducing lazy imports.

I repeat.

Python is introducing lazy imports.

inb4 the flood of `treewide: adopt lazy imports` +123,244 PRs https://t.co/1NzCAQVfmV

27

1K

51

267

100K

Appanacca retweeted

Brayden Zhang @brayden__zhang

9 months ago

I implemented DeepMind's AlphaEarth foundation model from scratch and open-sourced it. Take a look at the codebase here: https://t.co/CRYMkCduqF

10

630

79

392

61K

Appanacca retweeted

Towaki Takikawa / 瀧川永遠希

@yongyuanxi

9 months ago

Detecting 3D lines from gaussian splats

8

438

48

239

25K

Appanacca retweeted

Antoine Guédon @antoine_guedon

9 months ago

1/n🚀Gaussians > Differentiable function > Mesh? Check out our new work: MILo: Mesh-In-the-Loop Gaussian Splatting! 🎉Accepted to SIGGRAPH Asia 2025 (TOG) MILo is a novel differentiable framework that extracts meshes directly from Gaussian parameters during training. 🧵👇

7

338

58

182

25K

Appanacca retweeted

Mark Litwintschik @marklit82

10 months ago

GlobalBuildingAtlas' 2.75B Building Footprints are now on S3. 1.1 TB of uncompressed GeoJSON down to 210 GB of Parquet. Download any city's buildings in seconds: https://t.co/TMxxEnDSPI

marklit82's tweet photo. GlobalBuildingAtlas' 2.75B Building Footprints are now on S3. 1.1 TB of uncompressed GeoJSON down to 210 GB of Parquet.

Download any city's buildings in seconds: https://t.co/TMxxEnDSPI https://t.co/QporCASter

10

311

57

303

27K

Appanacca retweeted

Spatial AI Network @spatialainet

10 months ago

勉強会の資料をアップロードしました。サイバーエージェントの勝又海さんによる「3D Gaussian Splattingにおける派生プリミティブの設計」の解説です https://t.co/Oe99bjbRGs

0

87

17

42

6K

Appanacca retweeted

Federico Baldassarre @BaldassarreFe

10 months ago

Say hello to DINOv3 🦖🦖🦖 A major release that raises the bar of self-supervised vision foundation models. With stunning high-resolution dense features, it’s a game-changer for vision tasks! We scaled model size and training data, but here's what makes it special 👇

BaldassarreFe's tweet photo. Say hello to DINOv3 🦖🦖🦖

A major release that raises the bar of self-supervised vision foundation models.
With stunning high-resolution dense features, it’s a game-changer for vision tasks!

We scaled model size and training data, but here's what makes it special 👇 https://t.co/VBkRuAIOCi

40

2K

261

896

224K

Appanacca retweeted

Chris Offner

@chrisoffner3d

10 months ago

SigLIP (VLMs) and DINO are two competing paradigms for image encoders. My intuition is that joint vision-language modeling works great for semantic problems but may be too coarse for geometry problems like SfM or SLAM. Most animals navigate 3D space perfectly without language.

19

571

44

338

73K

Appanacca retweeted

Ryohei Sasaki@engineer

@rsasaki0109

11 months ago

VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences https://t.co/h4JukIjqRD

1

173

20

95

10K

Appanacca retweeted

Zhenjun Zhao @zhenjun_zhao

12 months ago

MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction @antoine_guedon, Diego Gomez, Nissim Maruani, Bingchen Gong, @GDrettakis, Maks Ovsjanikov tl;dr: generate a mesh at every training iteration from a set of points entangled with the Gaussians, Gaussian Pivots https://t.co/9b9usojzZp

zhenjun_zhao's tweet photo. MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction

@antoine_guedon, Diego Gomez, Nissim Maruani, Bingchen Gong, @GDrettakis, Maks Ovsjanikov

tl;dr: generate a mesh at every training iteration from a set of points entangled with the Gaussians, Gaussian Pivots

https://t.co/9b9usojzZp

1

95

13

71

5K

MarcoMando

@Appanacca

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users