Today we’re excited to unveil a new generation of Segment Anything Models:
1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts.
🔗 Learn more about SAM 3: https://t.co/CjMnf7fspz
2️⃣ SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image.
🔗 Learn more about SAM 3D: https://t.co/yXcvts8Ogc
These models offer innovative capabilities and unique tools for developers and researchers to create, experiment and uplevel media workflows.
My team at Meta is looking for summer research interns! We develop cutting-edge perception models like SAM 3, SAM 3D and Perception Encoder. Application link:
https://t.co/yTEwBRK7Kh
(the video is SAM 3 with prompt "fish")
We have LM Arena for chatbots, but what about one for computer vision models? It now exists! You can blind compare and rate models side by side on vision tasks. #SAM3 is currently the top scoring and fastest model for object detection!
https://t.co/E89h6YetCI
🧵Announcing Segment Anything 3! SAM 3 extends SAM 2 with open vocabulary text and exemplar prompts, enabling it to detect, segment, and track all instances of a target category in images/videos. We're releasing code, a checkpoint, an eval benchmark, & demo playground. SAM 3 will be coming soon to features in Edits, Vibes, & FB Marketplace! Deep dive below 👇
I am humbled to be re-featured as Women in Computer Vision for the BEST of CVPR section of the Computer Vision News July Magazine. It was great chatting with Ralph Anzarouth. I hope my unconventional career path can encourage more female researchers. https://t.co/vzLUNMRmQg
Knowledge vs Large Models?
Welcome to our #CVPR23 tutorial "Knowledge-Driven Vision-Language Encoding" with
@Xudong_Lin_AI@jayleicn@mohitban47@cvondrick@Shih-Fu Chang @elgreco_winter
Jun 19: 9:00-12:30
Loc: East 8
Website:https://t.co/cxHORLzTDh
Zoom:https://t.co/FbuuxRtZgg
What makes modern Video-Language (VidL) perform well? Check out our #CVPR2023 paper "VindLU: A Recipe for Effective Video-and-Language Pretraining" where we demystify the most critical factors in the VidL model design.
https://t.co/fNZ0RtdEgu
@fncheng2333@jayleicn@mohitban47
What is the value of knowledge in the era of large-scale pretraining? Welcome to our #AAAI23 tutorial "Knowledge-Driven Vision-Language Pretraining" with @Xudong_Lin_AI@jayleicn@mohitban47@Shih-Fu Chang @elgreco_winter
Feb 8: 2-6pm
Loc: Room 201
Zoom: https://t.co/IBuba5YEkd
🎉🎉BIG congrats to @ZinengTang for the amazing achievement of being selected as Winner (out of 4 in North America) of the 2023 CRA Outstanding Undergraduate Researcher Award! #ProudAdvisor🙂
🚨 Zineng is applying for a PhD this year 👉 https://t.co/R4URBXSZmo
@CRAtweets @unccs
Self-attention for VL tasks (esp. video+text) is too expensive!
Check out our #WACV2023 paper “Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention”
https://t.co/in7X3bPKVV
https://t.co/mo24Iu8dSn
@ZinengTang* @jmin__cho* @jayleicn@mohitban47
🧵
🎉Our LST paper was accepted to #NeurIPS2022🎉
Ladder Side-tuning achieves both memory & parameter efficiency in NLP + VL tasks.
Talk video: https://t.co/GmsHSd0sRB
Camera-ready version: https://t.co/VIK83Fw7lC
We will be in New Orleans, happy to chat!
@jmin__cho@mohitban47
Can GPT-3 understand videos? Glad to share our new work VidIL on prompting LLMs to understand videos using image descriptors (frame caption + visual token). We show strong few-shot video-to-text generation ability WITHOUT the need to train on ANY videos: https://t.co/96zmEqGe1n
🥳🥳 Check out our #ECCV2022 oral paper. We propose ECLIPSE 🌒 that integrates audio🔊🎵 into popular CLIP to have 2.92x faster and 2.34x memory-efficient for long-range video retrieval.
https://t.co/2YpUIpVWlm
https://t.co/B7mWFxBcm3
w. @jayleicn@mohitban47@gberta227
🧵👇