NVIDIA just made AI detect objects 10x faster by deleting one step.
It's called LocateAnything, and it removes the biggest bottleneck no one else was fixing in vision-language models.
Normally a model builds each bounding box one coordinate token at a time. 100 objects means thousands of tokens before an answer. NVIDIA scrapped that: their Parallel Box Decoding predicts the whole box in a single forward pass, as one atomic unit.
→ 12.7 boxes/sec on one H100
→ 10x faster than Qwen3-VL
→ +3.8% F1 on LVIS, accuracy up, not down
→ 3B params, runs on one consumer GPU
Treating the box as one unit keeps its coordinates tied together, which is why accuracy climbed instead of falling.
One model handles detection, GUI grounding, OCR, and document understanding, ready for computer-use agents, robotics, and document pipelines.
100% open source, weights, code, demo, and paper all live.
A Japanese dev open-sourced a drop-in replacement for NumPy that runs on your GPU.
It's called CuPy. Change one line import “numpy as np” → “import cupy as cp” and the same code runs up to 100x faster on CUDA.
→ Works with existing NumPy/SciPy code
→ No rewrite. No new syntax.
→ Also supports AMD ROCm
100% Open Source.
No uses Ollama si quieres usar IA local con buen rendimiento. No exprime tu GPU al máximo.
Mejor usa vLLM:
✓ Mejor eficiencia
✓ Más rendimiento al servir modelos
✓ En mis pruebas, hasta 2 veces más rápido
→ https://t.co/eYXA5IBlFf