Last booth day for FocoosAI at CVPR! Our CEO and CTO will be happy to introduce you our platform and code!
See you there!
Ps. We have amazing t-shirts! Come before they end!
🚨CVPR 2025 Highlight Paper Alert 🚨
➡️Paper Title: SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
🌟Few pointers from the paper
🎯Referring Video Object Segmentation (RVOS) relies on natural language expressions to segment an object in a video clip.
🎯Existing methods restrict reasoning either to independent short clips, losing global context, or process the entire video offline, impairing their application in a streaming fashion.
🎯In this work, authors aimed to surpass these limitations and design an RVOS method capable of effectively operating in streaming-like scenarios while retaining contextual information from past frames.
🎯They build upon the Segment-Anything 2 (SAM2) model, that provides robust segmentation and tracking capabilities and is naturally suited for streaming processing.
🎯They made SAM2 wiser, by empowering it with natural language understanding and explicit temporal modeling at the feature extraction stage, without fine-tuning its weights, and without outsourcing modality interaction to external models.
🎯To this end, they introduced a novel adapter module that injects temporal information and multi-modal cues in the feature extraction process.
🎯They further revealed the phenomenon of tracking bias in SAM2 and proposed a learnable module to adjust its tracking focus when the current frame features suggest a new object more aligned with the caption.
🎯Their proposed method, “SAMWISE”, achieves state-of-the-art across various benchmarks, by adding a negligible overhead of less than 5 M parameters.
🏢Organization: Politecnico di Torino [@PoliTOnews ], @FocoosAI
🧙Paper Authors: Claudia Cuttano, @gabTrivv , Gabriele Rosi, @masone_carlo , Giuseppe Averta
📝 Read the Full Paper here: https://t.co/boAuqTURIr
🗂️ Project Page: https://t.co/U1OmhvjtDp
🧑💻 Code: https://t.co/HWs5683DeM
🎥 Be sure to watch the attached Demo Video - Sound on 🔊🔊
🎵 Music by Adi Iswanto from @pixabay
Find this Valuable 💎 ?
♻️QT and teach your network something new
Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
#CVPR2025 #highlight
We released open source our core training and models code.
We’re in a path where computer vision models will require hours, not months, to be in production, and this is just the first step.
If you’re interested, drop us a star ⭐ and write me🔥
https://t.co/xjqKyPeB3G
We’ll be in Nashville in less than a week for #CVPR2025!
@gabrosi3 will show you whether is better to show (visual prompting) or tell (open vocabulary) to achieve better performance in segmentation!
Should you SHOW 🖼️ or TELL 📝 a model what to segment? 🤔
Our new #benchmark compares visual vs textual prompts for semantic segmentation across 14 datasets spanning 7 domains!
Check out our findings ⬇️
How do you reduce your costs when using LLMs? Let us hear thoughts!
For reading the full article or subscribing to our (new) substack: https://t.co/V6R87JDsM9
The Hidden Cost of LLMs: What Happens Under the Hood? A Thread 🧵
You’ve probably used an genAI assistant today, certainly powered by LLMs. But have you ever wondered what does it take to answer you? 🤔
It's not just computation—it’s energy and money. Let’s break it down
Do you really need the biggest model, long prompts, and verbose outputs? 🤷♂️
Smart choices = lower costs, higher efficiency, and reduced environmental impact.
Let’s do more with less. 💡🔋 #AI#LLMs#Sustainability