NVIDIA just released LocateAnything
It decodes bounding boxes as atomic units in a single step, replacing slow token-by-token coordinate generation. Parallel Box Decoding achieves 2.5× faster inference while improving localization accuracy across detection, OCR, and GUI tasks.