Top Tweets for #vLM
Day 2 of VLX, VLX-Seek: improving VLM fine-grained perception via region reference instead of coordinate generation!
VLMs are good at understanding what is in an image, but still struggle to pinpoint where it is.
Coordinate generation is fragile: long numeric outputs, formatting errors, missed objects, and hallucinated boxes.
VLX-Seek takes a different path: region reference instead of coordinate generation.
It retrieves candidate regions, turns them into language-addressable region tokens, and lets the model select <region_i> instead of generating [x1, y1, x2, y2].
From detection and referring expressions to counting, OCR, and embodied interaction, VLX-Seek pushes VLMs from “seeing the image” toward “grounding objects in space.”
Github:https://t.co/JUl9jPttwg
huggingface:https://t.co/rrig30Dn8e
#VLX #VLXModel #VLXSeries #StreamingMultimodal #OnDeviceAI #PhysicalAI #EmbodiedAI #EdgeAI #VisionLanguageModel #AI
We’re excited to release VLX, starting with VLX-Flow: a streaming vision-language model designed for real-time video understanding!
Long context isn't the cure for live video streams. Reprocessing history blows up VRAM, while sparse sampling drops crucial causal details.
The fix? Incremental memory updates.
This is VLX-Flow: video stream -> memory state -> interaction.
It continuously tracks "what just happened," so Q&A and alerts can trigger anytime.
Github :https://t.co/DD7W3ZLXfD
huggingface:https://t.co/N6mz654JEQ
#VLX #VLXModel #VLXSeries #StreamingMultimodal #OnDeviceAI #PhysicalAI #EmbodiedAI #EdgeAI #VisionLanguageModel #AI
https://t.co/XYiRLbPcF8
#MaterialHandling #Webinar #Storage #StorageSystems #Automation #VLM #VerticalLiftModule #Presentation
PixelRAG: 문서를 텍스트가 아닌 스크린샷으로 검색하는 시각 기반 RAG 프레임워크
(by 9bow님)
https://t.co/PysmJiHFy2
#rag #vlm #multimodalrag #documentsearch #visualretrieval #pixelrag
We have heard the beating of the AI drum for years.
But the pitch has been the same since 2016, and most of us have learned to tune it out.
It is worth it to accept that #AI has made a significant change. #VLM #radiology https://t.co/aJqONE1TJe
.
Domain Listed For Sale
https://t.co/bBjhc7Hja1
A powerful AI name for vision-language models, action recognition, video intelligence, robotics, autonomous systems, and multimodal AI platforms.
#ActionVLM #VLM #AI #Vision #Language #Multimodal #Video #Robotics #Automation #Agents #Perception #Recognition #Inference #Model #Data #SaaS #Autonomous #Analytics #Startup #DomainForSale

IROS'26 1/3 [ABot-Explorer]
✅ Online Unified Exploration & Memory
✅ VLM-based Semantic Affordances
✅ SOTA Efficiency & Coverage
Mimicking human logic for Embodied AI. 🚀
📝https://t.co/lQeDuDSixq
💻https://t.co/lHQHNNDsm3
#IROS2026 #EmbodiedAI #VLM
![Alibaba_AMAP's tweet photo. IROS'26 1/3 [ABot-Explorer]
✅ Online Unified Exploration & Memory
✅ VLM-based Semantic Affordances
✅ SOTA Efficiency & Coverage
Mimicking human logic for Embodied AI. 🚀
📝https://t.co/lQeDuDSixq
💻https://t.co/lHQHNNDsm3
#IROS2026 #EmbodiedAI #VLM https://t.co/h3ljRa1sAT](https://pbs.twimg.com/media/HLdrQMybcAAAoVk.jpg)
Testing LightOn-OCR-2-1B, a VLM-powered OCR model for intelligent document understanding.
Instead of extracting plain text, it generates structured information from documents like passports, IDs, invoices, and forms. 🤖📄
#OCR #DocumentAI #VLM

NVIDIA's new SpatialClaw lets AI agents write code, not just call tools, to reason about 3D space. Result: +11.2 points across 20 benchmarks, no retraining needed.
https://t.co/h9T946wEFi #VLM #SpatialReasoning #AgenticAI #NVIDIA #OpenSource #MachineLearning #ComputerVision
IPT is accepted to @eccvconf 2026!
Many projects use text CoT for spatial reasoning, but humans imagine the scene.
IPT teaches VLMs to reason through imaginative thoughts.
#ECCV2026 #ComputerVision #VLM #SpatialReasoning #Robotics #WorldModels
What if VLMs could imagine before answering?
IPT supervises visual intermediate states for spatial reasoning:
1. Path tracing → side view
2. Perspective taking → new viewpoint
3. Multiview counting → top-down map
Paper: https://t.co/57KvrXgPFv

コンピュータービジョン分野の国際学会「CVPR2026」で感じた変化
https://t.co/qB1klfITtY
#Acroquest #テックブログ #CVPR2026 #コンピュータービジョン #ComputerVision #AI #機械学習 #FoundationModel #VLM
Introducing Orion 2: the most capable visual agent, now with code-mode ✨
Rather than calling tools one by one, Orion 2 generates a visual AI program and executes it end-to-end, meaning fewer round-trips and lower latency.
When orchestration is code, every workflow is composable, inspectable, and deterministic.
Last Seen Hashtags on Sotwe
Trends for you
Most Popular Users

Elon Musk 
@elonmusk
240.6M followers

Barack Obama 
@barackobama
119.2M followers

Donald J. Trump 
@realdonaldtrump
111.7M followers

Cristiano Ronaldo 
@cristiano
110.5M followers

Narendra Modi 
@narendramodi
107M followers

Rihanna 
@rihanna
97.6M followers

NASA 
@nasa
92.2M followers

Justin Bieber 
@justinbieber
90.9M followers

KATY PERRY 
@katyperry
87.6M followers

Taylor Swift 
@taylorswift13
81.4M followers

Lady Gaga 
@ladygaga
73M followers

Virat Kohli 
@imvkohli
69.8M followers

Kim Kardashian 
@kimkardashian
69.8M followers

YouTube 
@youtube
68.7M followers

Bill Gates 
@billgates
63.9M followers

Neymar Jr 
@neymarjr
62.5M followers

The Ellen Show
@theellenshow
62.4M followers

CNN 
@cnn
61.9M followers

X 
@x
60.8M followers

Selena Gomez 
@selenagomez
60.7M followers


![OmAI_lab's tweet photo. Day 2 of VLX, VLX-Seek: improving VLM fine-grained perception via region reference instead of coordinate generation!
VLMs are good at understanding what is in an image, but still struggle to pinpoint where it is.
Coordinate generation is fragile: long numeric outputs, formatting errors, missed objects, and hallucinated boxes.
VLX-Seek takes a different path: region reference instead of coordinate generation.
It retrieves candidate regions, turns them into language-addressable region tokens, and lets the model select <region_i> instead of generating [x1, y1, x2, y2].
From detection and referring expressions to counting, OCR, and embodied interaction, VLX-Seek pushes VLMs from “seeing the image” toward “grounding objects in space.”
Github:https://t.co/JUl9jPttwg
huggingface:https://t.co/rrig30Dn8e
#VLX #VLXModel #VLXSeries #StreamingMultimodal #OnDeviceAI #PhysicalAI #EmbodiedAI #EdgeAI #VisionLanguageModel #AI](https://pbs.twimg.com/media/HL3ZaGCbgAEtu-5.jpg)





















