Top Tweets for #VisionLanguage
Ever wanted an AI that can see and talk? Meet Qwen3VL 8B SFT, a fine tuned vision language model that understands images and text together. It's like giving your apps eyes and a brain, all in one compact package. #AI #VisionLanguage

🤖 https://t.co/UNrWKsJA1B – Vision Language Models, served as API.
Multimodal AI at scale. The definitive infrastructure domain.
For startups building the next wave of computer vision.
#APIVLM #MultimodalAI #VisionLanguage #AIInfrastructure #Domains
Representation geometry shapes VLM task performance untuk CT enterography. First study memilih representational choices untuk automated IBD analysis. https://t.co/C3rH5wA9ZR #AI #Medical #VisionLanguage #CTScan
Delighted to share that our work "System-Mediated Attention Imbalances Make Vision-Language Models Say Yes" has been accepted to #ACL2026 Findings! 🇺🇸
📜 Paper: https://t.co/bt0hREfceU
⌨️ Code: https://t.co/KKIYMcLLdS (coming soon)
#ACL2026 #VisionLanguage #VLM #MultimodalAI
If you're interested in multimodal reasoning, math LLMs, or vision-language models, do check out our paper! #MultimodalAI #LLMs #MathReasoning #VisionLanguage #AIForScience #VisualReasoning #SymbolicReasoning #GeometricReasoning
Multimodal Large Language Models (MLLMs) like MMLU and SwinBert are pushing the boundaries of language and vision integration.
But, as Loc3R-VLM demonstrates, they still struggle with spatial understanding & viewpoint-aware reasoning. #VisionLanguage
🔥 VLMs struggle with multi-round visual reasoning, failing to iteratively refine understanding across visual contexts.
🌊 RegionReasoner enables iterative visual understanding via region-grounded multi-round reasoning.
#AI #VisionLanguage #Reasoning
https://t.co/5M6Q8CmzyM
@alibaba_cloud 🍀🌿
Excited to see #Qwen3_5Flash live! ⚡️ Pushing the boundaries of #AI with lightning-fast #VisionLanguage models. Can’t wait to explore the future of #CloudComputing and #Innovation! 🌐💡 #AlibabaCloud #Efficiency #LLM #ArtificialIntelligence
📣New publication in #Inventions!
📑Image Captioning Using Enhanced Cross-Modal Attention with Multi-Scale Aggregation for Social Hotspot and Public Opinion Monitoring
👤Jiang, S. et al.
🔗https://t.co/2S9nR7Utj3
#DeepLearning #VisionLanguage #ImageCaptioning #MultimodalAI

🔥 Multimodal models memorize visuals but fail to describe them in text. This "modal aphasia" challenges unified AI.
🌊 We reveal this dissociation: models recall images but can't articulate their content.
@josh_swanson_
#AI #Multimodal #VisionLanguage
https://t.co/n9QiP50Li7
🚨 New model alert! 🚨
We've got Qwen3-VL-8B-Instruct & Qwen3-VL-8B-Thinking added to LocalAI! 🎉 These are 8B parameter vision-language models.
Try it out: `local-ai run qwen3-vl-8b-instruct` or `local-ai run qwen3-vl-8b-thinking` 🚀 #LocalAI #Qwen3 #VisionLanguage
@fepegar_ @MSFTResearch 医療AI分野では2D画像の研究が先行していたが、CTスキャンのような3Dボリュームデータを扱えるモデルは限られていた。
COLIPRIは胸部CT特化で、より実用的な診断支援への道を開く。
#AI #医療AI #VisionLanguage #MicrosoftResearch #HuggingFace
👀 🏋️♂️ Train smarter, not just larger. VisGym’s scalable visual tasks reveal where VLMs still struggle and how to push them further. Try it out! #MachineLearning #VisionLanguage #VisGym
🎮 We release VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents (w/ @junyi42 @aomaru_21490)
🌐 With 17 environments across multiple domains, we show systematically the brittleness of VLMs in visual interaction, and what training leads to.
🧵[1/8]
👀 AI can finally find YOUR dog in a crowded park! Researchers fine‑tuned vision‑language models with video‑tracking data, boosting personalized object localization by up to 21%. #AI #VisionLanguage #ComputerVision #ML
🔎 AI can finally recognize _your_ pet, not just any dog! MIT researchers use video‑tracking frames + pseudo‑names to teach models context‑based localization, lifting accuracy 21% 🚀 #AI #ComputerVision #MachineLearning #VisionLanguage
Joint work with @phu_pmh, Min Bai, Nikolaos Pappas, Zheng Qi, and Sandesh Swamy.
Read more below!
https://t.co/7Z81ltlka9
#AI #ML #VLM #MultimodalAI #VisionLanguage #Interpretability #AIAlignment #ReasoningModels #Evaluation #Hallucinations #Multimodal #MachineLearning
📢 Call for Papers | Vision-and-Language Intelligence: From image understanding to multimodal reasoning.
🗓️ Deadline: 31 Mar 2026
👥 Guest Editors: @QiWu_AIML Dr. Feras Dayoub, Jason Xue, Arpit Garg
🔗 https://t.co/M1PtxRs8gE
#VisionLanguage #MultimodalAI #ComputerVision

Serving Qwen2-VL 7B with vLLM V1 on VisionArena benchmarks. At high QPS the V1 engine significantly outperforms V0. If you're still on the old architecture for multimodal workloads you're leaving perf on the table. #vLLM #VisionLanguage #Benchmark
Excited to share our work at #NeurIPS2025! DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding 📌 Poster Presentation
📷 Learn more: https://t.co/p2GsUBtQx3 #AI #VisionLanguage #FireSafety #NeurIPS

Excited to share our work at #NeurIPS2025!
DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding
📌 Poster Presentation
📂 Learn more: https://t.co/d2sgFUpErD
#AI #VisionLanguage #FireSafety #NeurIPS

Last Seen Hashtags on Sotwe
Trends for you
Most Popular Users

Elon Musk 
@elonmusk
240.1M followers

Barack Obama 
@barackobama
119.3M followers

Donald J. Trump 
@realdonaldtrump
111.6M followers

Cristiano Ronaldo 
@cristiano
108.8M followers

Narendra Modi 
@narendramodi
106.9M followers

Rihanna 
@rihanna
97.2M followers

NASA 
@nasa
92.1M followers

Justin Bieber 
@justinbieber
90.5M followers

KATY PERRY 
@katyperry
86.7M followers

Taylor Swift 
@taylorswift13
80.5M followers

Lady Gaga 
@ladygaga
72.1M followers

Kim Kardashian 
@kimkardashian
69.3M followers

YouTube 
@youtube
68.6M followers

Virat Kohli 
@imvkohli
68.4M followers

Bill Gates 
@billgates
63.4M followers

The Ellen Show
@theellenshow
62.5M followers

CNN 
@cnn
61.9M followers

Neymar Jr 
@neymarjr
60.9M followers

X 
@x
60.9M followers

CNN Breaking News 
@cnnbrk
59.9M followers


















