#vLM - Twitter Hashtag

about 6 hours ago

(2/3 VLX series) VLX-Seek: a light 3B VLM for visual grounding that beats Gemini in benchmark and in the wild. P.S. we have deployed beta-version of VLX-Seek to more than 100 customers, so it's mission proven. API site coming out soon! #VLX #VLM #AI

OmAI Lab

@OmAI_lab

about 15 hours ago

Day 2 of VLX, VLX-Seek: improving VLM fine-grained perception via region reference instead of coordinate generation! VLMs are good at understanding what is in an image, but still struggle to pinpoint where it is. Coordinate generation is fragile: long numeric outputs, formatting errors, missed objects, and hallucinated boxes. VLX-Seek takes a different path: region reference instead of coordinate generation. It retrieves candidate regions, turns them into language-addressable region tokens, and lets the model select <region_i> instead of generating [x1, y1, x2, y2]. From detection and referring expressions to counting, OCR, and embodied interaction, VLX-Seek pushes VLMs from “seeing the image” toward “grounding objects in space.” Github：https://t.co/JUl9jPttwg huggingface：https://t.co/rrig30Dn8e #VLX #VLXModel #VLXSeries #StreamingMultimodal #OnDeviceAI #PhysicalAI #EmbodiedAI #EdgeAI #VisionLanguageModel #AI

1

13

7

1

472

0

88

Tiancheng Zhao (Tony)

@tianchezhao

1 day ago

(1/3 VLX series) Excited to share our latest streaming VLM functions for real-time video understanding! API site coming out soon in a few days! #VLM #AI

OmAI Lab

@OmAI_lab

1 day ago

We’re excited to release VLX, starting with VLX-Flow: a streaming vision-language model designed for real-time video understanding！ Long context isn't the cure for live video streams. Reprocessing history blows up VRAM, while sparse sampling drops crucial causal details. The fix? Incremental memory updates. This is VLX-Flow: video stream -> memory state -> interaction. It continuously tracks "what just happened," so Q&A and alerts can trigger anytime. Github ：https://t.co/DD7W3ZLXfD huggingface：https://t.co/N6mz654JEQ #VLX #VLXModel #VLXSeries #StreamingMultimodal #OnDeviceAI #PhysicalAI #EmbodiedAI #EdgeAI #VisionLanguageModel #AI

3

26

6

2K

2

9

0

1

813

REB Storage Systems @REB_Storage

3 days ago

https://t.co/XYiRLbPcF8 #MaterialHandling #Webinar #Storage #StorageSystems #Automation #VLM #VerticalLiftModule #Presentation

0

14

Jisoo Kim @jisooslog

3 days ago

#robotics #DexterousManipulation #VLM #3DGrounding #ZeroshotManipulation

0

3

0

237

파이토치 한국 사용자 모임 @PyTorchKR

5 days ago

PixelRAG: 문서를 텍스트가 아닌 스크린샷으로 검색하는 시각 기반 RAG 프레임워크 (by 9bow님) https://t.co/PysmJiHFy2 #rag #vlm #multimodalrag #documentsearch #visualretrieval #pixelrag

0

17

Ty Vachon, M.D. @helpfulrads

5 days ago

We have heard the beating of the AI drum for years. But the pitch has been the same since 2016, and most of us have learned to tune it out. It is worth it to accept that #AI has made a significant change. #VLM #radiology https://t.co/aJqONE1TJe

0

59

Jacob Asir @j_viston

5 days ago

Here’s how it works: 1) Set Preferences Select your allergies and dietary needs. 2) Snap a Photo Scan the ingredients list on any package. 3) Get Clear Results Receive a bilingual explanation with highlighted evidence. #JapanAI #App #VLM #Gemini活用

0

21

𝐀𝐥𝐡𝐚𝐦𝐝𝐮𝐥𝐢𝐥𝐥𝐚𝐡

@DomainFQ

5 days ago

. Domain Listed For Sale https://t.co/bBjhc7Hja1 A powerful AI name for vision-language models, action recognition, video intelligence, robotics, autonomous systems, and multimodal AI platforms. #ActionVLM #VLM #AI #Vision #Language #Multimodal #Video #Robotics #Automation #Agents #Perception #Recognition #Inference #Model #Data #SaaS #Autonomous #Analytics #Startup #DomainForSale

DomainFQ's tweet photo. .
Domain Listed For Sale

https://t.co/bBjhc7Hja1

A powerful AI name for vision-language models, action recognition, video intelligence, robotics, autonomous systems, and multimodal AI platforms.

#ActionVLM #VLM #AI #Vision #Language #Multimodal #Video #Robotics #Automation #Agents #Perception #Recognition #Inference #Model #Data #SaaS #Autonomous #Analytics #Startup #DomainForSale

0

25

AmapAI @Alibaba_AMAP

6 days ago

IROS'26 1/3 [ABot-Explorer] ✅ Online Unified Exploration & Memory ✅ VLM-based Semantic Affordances ✅ SOTA Efficiency & Coverage Mimicking human logic for Embodied AI. 🚀 📝https://t.co/lQeDuDSixq 💻https://t.co/lHQHNNDsm3 #IROS2026 #EmbodiedAI #VLM

Alibaba_AMAP's tweet photo. IROS'26 1/3 [ABot-Explorer]
✅ Online Unified Exploration & Memory
✅ VLM-based Semantic Affordances
✅ SOTA Efficiency & Coverage
Mimicking human logic for Embodied AI. 🚀
📝https://t.co/lQeDuDSixq
💻https://t.co/lHQHNNDsm3
#IROS2026 #EmbodiedAI #VLM https://t.co/h3ljRa1sAT

1

11

5

3

1K

JoelNadarAI @joelnadarai

6 days ago

Testing LightOn-OCR-2-1B, a VLM-powered OCR model for intelligent document understanding. Instead of extracting plain text, it generates structured information from documents like passports, IDs, invoices, and forms. 🤖📄 #OCR #DocumentAI #VLM

joelnadarai's tweet photo. Testing LightOn-OCR-2-1B, a VLM-powered OCR model for intelligent document understanding.

Instead of extracting plain text, it generates structured information from documents like passports, IDs, invoices, and forms. 🤖📄

#OCR #DocumentAI #VLM https://t.co/dHNp9CB7zD

2

6

2

0

373

C Thomas (Tom) Smith @ctsmithiii

6 days ago

NVIDIA's new SpatialClaw lets AI agents write code, not just call tools, to reason about 3D space. Result: +11.2 points across 20 benchmarks, no retraining needed. https://t.co/h9T946wEFi #VLM #SpatialReasoning #AgenticAI #NVIDIA #OpenSource #MachineLearning #ComputerVision

0

1

0

16

Owais @muffBozo

7 days ago

Releasing gUrrT v2!! a local, open source Conversational Video Intelligence for Q&A over all of your video lectures. No GPU required. No subscriptions. Your video never leaves your machine. #OpenSource #LocalAI #VLM #LLM #VideoAI #Python https://t.co/dvuNOwXqyL

1

0

35

SooSoo @soosoomoo

7 days ago

VLM、試した。明日も、やろう。 #VLM #VLA

0

11

Weikai Huang

@weikaih04

10 days ago

IPT is accepted to @eccvconf 2026! Many projects use text CoT for spatial reasoning, but humans imagine the scene. IPT teaches VLMs to reason through imaginative thoughts. #ECCV2026 #ComputerVision #VLM #SpatialReasoning #Robotics #WorldModels

Weikai Huang

@weikaih04

16 days ago

What if VLMs could imagine before answering? IPT supervises visual intermediate states for spatial reasoning: 1. Path tracing → side view 2. Perspective taking → new viewpoint 3. Multiview counting → top-down map Paper: https://t.co/57KvrXgPFv

weikaih04's tweet photo. What if VLMs could imagine before answering?

IPT supervises visual intermediate states for spatial reasoning:

1. Path tracing → side view
2. Perspective taking → new viewpoint
3. Multiview counting → top-down map

Paper: https://t.co/57KvrXgPFv https://t.co/uMZdiCC5iZ

4

69

16

46

9K

2

29

5

11

4K

アクロクエスト技術ブログ（Taste of Tech Topics） @Acroquest_blog

11 days ago

コンピュータービジョン分野の国際学会「CVPR2026」で感じた変化 https://t.co/qB1klfITtY #Acroquest #テックブログ #CVPR2026 #コンピュータービジョン #ComputerVision #AI #機械学習 #FoundationModel #VLM

0

6

2

1

380

週間ゆめの｜ゆめの結人 @WWMTTB

13 days ago

彼らが現実世界を支えるようになった時、私たちは初めて、世界を説明することなく、ただ「心で感じる」ことに集中できるのかもしれません。詳細は引用ポストより。 #週刊ゆめの #VLM #概念芸術

週間ゆめの｜ゆめの結人 @WWMTTB

15 days ago

林檎を食べた人形 ―― ブリキの兵隊と夢の国 4Days ／ Day1 機械が自らの「目」で世界を認識した時、私たちは現実を翻訳する役割から解放されるのです。ブリキの兵隊たちが静かに目覚める。 noteにて。 #週刊ゆめの #フィジカルAI #VLM @WWMTTB #スキしてみて https://t.co/jDUt8XUvjw

0

24

1

0

785

0

41

3

0

172

週間ゆめの｜ゆめの結人 @WWMTTB

15 days ago

林檎を食べた人形 ―― ブリキの兵隊と夢の国 4Days ／ Day1 機械が自らの「目」で世界を認識した時、私たちは現実を翻訳する役割から解放されるのです。ブリキの兵隊たちが静かに目覚める。 noteにて。 #週刊ゆめの #フィジカルAI #VLM @WWMTTB #スキしてみて https://t.co/jDUt8XUvjw

0

24

1

0

785

Shahrear Bin Amin @shahrear_amin

17 days ago

Shipped Orion 2 - our most capable visual agent, now with code execution. Instead of calling vision tools one at a time, it writes a program and runs it end-to-end: faster, cheaper, more reliable. #ai #vlm #llm

VLM Run

@vlmrun

17 days ago

Introducing Orion 2: the most capable visual agent, now with code-mode ✨ Rather than calling tools one by one, Orion 2 generates a visual AI program and executes it end-to-end, meaning fewer round-trips and lower latency. When orchestration is code, every workflow is composable, inspectable, and deterministic.

7

24

5

9

8K

0

4

2

1

171

（公財）未来工学研究所

@ifeng_official1

19 days ago

この研究結果は、現在の仮想人間学習モジュールである #VLM には予測的な社会的知能が著しく欠けている事を示唆おり、つまり、人間とロボットの円滑な相互作用にとって重要なスキルである、人間の表情を解釈してその情報を用いて結果を予測する事が出来ない事を意味するのだとか。

（公財）未来工学研究所

@ifeng_official1

19 days ago

お疲れ様です、未来研公式です。人間は他者の反応を読み取る能力を持っていますが、#AI 搭載のロボットは人間の表情を読み取って、そのニーズを予測する事を学習しても人間特有の信号を読み取る事は苦手な事が、この度の米コーネル大の研究で明らかに。その問題点とは。 https://t.co/JvOCAr4PpP

0

2

0

894

0

2

0

54

Monocat@日刊工業新聞社 @monocat_cc_note

20 days ago

7/10（金）13:00〜17:00 【オンライン×録画配信】高い識別精度で世界的に注目！AIベースの画像検査 構造的異常検知/論理的異常検知の基礎と最新動向 https://t.co/HJalhPZEqM #異常検知 #中京大学 #画像検査 #設計 #開発 #VLM #LLM #データ #工場 #AI

0

2

1

0

90

Top Tweets for #vLM

Last Seen Hashtags on Sotwe

Trends for you

Most Popular Users