Daniel DeTone @ddetone - Twitter Profile

Daniel DeTone @ddetone

16 days ago

@weikaih04 @allen_ai @uwcse @AIatMeta Very cool! Is this conditioning on the semi dense depth point cloud or RGB only?

1

0

72

Daniel DeTone @ddetone

about 1 month ago

The biggest impact of the Segment Anything line of work was not the actual image segmentation, but rather the flood of paper titles with the name “Any” in them. Cmon folks, let’s just call this generalization and move on!!

0

6

0

2

531

Daniel DeTone @ddetone

about 1 month ago

@MattNiessner I agree that metric 3D is critical! It’s a compressed, minimal representation. Playing devils advocate — humans also operate on projections of the 3D world and we are able to operate pretty well

1

9

0

662

Daniel DeTone @ddetone

about 1 month ago

New blog post about Boxer is live on the Project Aria website

Project Aria @Meta

@meta_aria

about 1 month ago

How do you decompose a 2D image into accurate 3D object detections? You use🥊Boxer. A new model from Reality Labs Research enables robust 3D object detection by "lifting" 2D proposals from off-the-shelf detectors like OWL-ViT and SAM into metric 3D space. No more "flat" AI—this is about spatial intelligence for the next generation of wearables. Blog🔗 https://t.co/WdOgFPzBBI Website with links to download: https://t.co/HUdive9EYx 👉@ddetone

0

55

9

30

5K

0

14

0

5

2K

Who to follow

Paul-Edouard Sarlin

@pesarlin

Researcher at @Google, 3D computer vision & machine learning. Previously PhD at ETH Zurich, intern at @Google, @Meta, @Microsoft, @magicleap.

Noah Snavely

@Jimantha

3D vision fanatic. Professor @cornell_tech & Researcher @GoogleDeepmind. He or they. https://t.co/m7Rs5xUFfG

Visual Geometry Group (VGG)

@Oxford_VGG

Computer Vision research group @UniofOxford led by Andrew Zisserman, Andrea Vedaldi, João Henriques, Christian Rupprecht, and Iro Laina

Daniel DeTone @ddetone

about 2 months ago

@neural_avb I was there in Barcelona! Epic

0

2

0

1K

Daniel DeTone @ddetone

about 2 months ago

@Capsbrr Ah sorry, I thought you meant on Quest cameras, not running on the ML model on Quest hardware. I don’t think this model can run in realtime on Quest. Though it could probably be distilled significantly with further effort and maybe work

1

2

0

22

Daniel DeTone @ddetone

about 2 months ago

Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: https://t.co/5IZ0tPlqvr Here we show Boxer in action on an egocentric sequence captured from smart glasses:

22

1K

167

949

79K

Daniel DeTone @ddetone

about 2 months ago

@Capsbrr yes

1

0

58

Daniel DeTone @ddetone

about 2 months ago

@weikaih04 that would be great! we didn't train on much on outdoor data, I would expect a big boost the WildDet3D dataset training for outdoors

0

1

0

171

Daniel DeTone @ddetone

about 2 months ago

Cool showcase from @_satyam_ai running Boxer on RGB video using COLMAP for poses + pointcloud and GeoCalib for gravity estimation

Satyam Kumar

@_satyam_ai

about 2 months ago

I implemented it and the ~8 degree gravity correction from GeoCalib made a real difference. Look at the monitor - on the left (pose heuristic) the box is tilted and doesn't match the screen edges, on the right (GeoCalib) it wraps the monitor much more tightly. The shelf boxes at the top are also cleaner, less overshoot. Yeah, the improvement is clear.

_satyam_ai's tweet photo. I implemented it and the ~8 degree gravity correction from GeoCalib made a real difference.

Look at the monitor - on the left (pose heuristic) the box is tilted and doesn't match the screen edges, on the right (GeoCalib) it wraps the monitor much more tightly. The shelf boxes at the top are also cleaner, less overshoot.

Yeah, the improvement is clear.

1

6

0

2K

0

19

1

8

2K

Daniel DeTone @ddetone

about 2 months ago

@_satyam_ai @pesarlin GeoCalib looking solid 💪

1

0

51

Daniel DeTone @ddetone

about 2 months ago

@_satyam_ai Amazing!

1

0

110

Daniel DeTone @ddetone

about 2 months ago

@_satyam_ai the gravity estimate looks a little bit off. another idea could be to run this per frame and take the global 3D average: https://t.co/KwRqnfa1F8

2

3

0

138

Daniel DeTone @ddetone

about 2 months ago

@yesitsarmin yes, the main limitation is the 2D detector here, but there are tons of better models (SAM3, VLMs) if you have the compute. for very cluttered scenes it doesn't work as well

0

71

Daniel DeTone @ddetone

about 2 months ago

@ElioenaiSiqCst Yes, I didn't show any examples of that but we trained on a massive internal-only Quest3 dataset

0

120

Daniel DeTone @ddetone

about 2 months ago

@BlueAquilae great question! I would not expect it to work well here, we would need to re-train it with a full 9 DoF representation. but feel free to try it out anyway, I'd be curious

0

1

0

99

Daniel DeTone @ddetone

about 2 months ago

@CleverBetTips The National?

0

1

0

96

Daniel DeTone @ddetone

about 2 months ago

@haodongli00 One limitation I found using both of those models is the runtime. For detecting 1000+ text prompts with SAM3 it takes 20+ sec per image. SAM3D also takes ~15 sec per object, so running on large datasets can be expensive. OWLv2 runs at ~30ms and Boxer takes ~20ms

0

1

0

1

128

Daniel DeTone @ddetone

about 2 months ago

@nickkarpov Feel free to file a GitHub issue if you have any problems! Will do my best to answer them quickly

0

3

0

639

Daniel DeTone @ddetone

about 2 months ago

For more details, check out the arxiv paper here: https://t.co/3CTK5TVDYc

0

7

2

4

1K

Daniel DeTone @ddetone

about 2 months ago

BoxerNet runs FAST 🔥🔥, taking roughly 20ms on a 4090 with bfloat16 for ALL prompts in an image (e.g. 30 boxes in parallel)

2

13

0

1K

Daniel DeTone

@ddetone

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users