Wei Lin @ CVPR 2025 @WeiLinCV - Twitter Profile

Wei Lin @ CVPR 2025 @WeiLinCV

4 months ago

📄 https://t.co/7EyhoC8WJJ 🤗 https://t.co/YflnlvkEUE 💻 https://t.co/hjy9KDTBBX

0

47

Wei Lin @ CVPR 2025 @WeiLinCV

4 months ago

Excited to share our paper “PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies” has been accepted to ICLR 2026 🎉 🥳This work means a lot to me as it's my first time serving as the last author supervising a Master student Huge congrats to Lukas Selch!

WeiLinCV's tweet photo. Excited to share our paper “PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies” has been accepted to ICLR 2026 🎉 🥳This work means a lot to me as it's my first time serving as the last author supervising a Master student Huge congrats to Lukas Selch! https://t.co/qnWcw9bDlV

3

9

1

419

Wei Lin @ CVPR 2025 @WeiLinCV

4 months ago

🙏✨Many thanks to all co-authors for the great collaboration: @LukasSelch @yufanghou @jmie_mirza @SivanDoveh James Glass @RogerioFeris #ICLR2026 #iclr

0

135

Wei Lin @ CVPR 2025 @WeiLinCV

4 months ago

🔍📄PRISMM-Bench is the first benchmark built from real reviewer-flagged multimodal inconsistencies in scientific papers, targeting a key challenge for multimodal scientific reasoning. I’m especially happy that the paper was well received, with an initial rating of 6-6-6-6 🙌🔥.

0

30

Wei Lin @ CVPR 2025 @WeiLinCV

9 months ago

🚀 🚀 We are introducing VisualOverload🎨🖼️, a VQA benchmark designed to test fundamental vision skills in visually dense scenes. 2,720 Q&A pairs across 6 tasks, 150 high-res artworks, and private ground truth. Even top VLMs hit only ~20% on the hardest tasks. Try it yourself🤖👉

Paul Gavrikov @ CVPR 🇺🇸

@PaulGavrikov

9 months ago

Is basic image understanding solved in today’s SOTA VLMs? Not quite. We present VisualOverload, a VQA benchmark testing simple vision skills (like counting & OCR) in dense scenes. Even the best model (o3) only scores 19.8% on our hardest split.

PaulGavrikov's tweet photo. Is basic image understanding solved in today’s SOTA VLMs? Not quite.

We present VisualOverload, a VQA benchmark testing simple vision skills (like counting & OCR) in dense scenes. Even the best model (o3) only scores 19.8% on our hardest split. https://t.co/g2x2DsxqZC

2

10

4

2K

0

3

0

226

Wei Lin @ CVPR 2025 @WeiLinCV

11 months ago · Bhutan

🚨 New @ICCVConference 2025 paper! Can GPT-4o actually localize an object from just a few examples? Turns out not really. In our @ICCVConference paper, we propose a simple fix: teach it from video tracking data. Results? Better few-shot localization, stronger context grounding.

Sivan Doveh @SivanDoveh

11 months ago

IPLOC accepted to ICCV25 ☺️ Thanks to all the people that were part of it 🩷 The idea for this paper came by a lake during a visit to Graz for a talk. It has traveled with me through too many countries and too many wars, and it’s now a complete piece of work.

2

15

2

0

2K

0

4

0

221

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago · Nashville

Check our new work pLSTM that brings the power of linear RNNs to arbitrary DAGs and multi-dimensional data, enabling parallel computation and long-range modeling. It outperforms Transformers on extrapolation tasks and handles images, graphs, and grids with remarkable efficiency.

Korbinian Poeppel @KorbiPoeppel

12 months ago

Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: https://t.co/nU7626uHWK

KorbiPoeppel's tweet photo. Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions?
Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs.
Paper link: https://t.co/nU7626uHWK https://t.co/fz5Nv40CHr

4

135

42

95

15K

0

4

0

1

219

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago

@CVPR Thanks to the amazing co-authors: @CvGfmei @YimingWang107 @Poiex @TeV_FBK

0

2

0

96

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago · Nashville

Check out our poster and talk with Guofeng at #355 in ExHall D. PerLA, is our new 3D language assistant that helps LLMs better understand the physical world! PerLA fuses local details and global context from point clouds using cross-attention + GNNs, and achieves SOTA on 3D bench

WeiLinCV's tweet photo. Check out our poster and talk with Guofeng at #355 in ExHall D. PerLA, is our new 3D language assistant that helps LLMs better understand the physical world! PerLA fuses local details and global context from point clouds using cross-attention + GNNs, and achieves SOTA on 3D bench https://t.co/4RyzESJmF2

1

6

2

0

307

WeiLinCV retweeted

Roei Herzig ✈️ CVPR

@roeiherzig

12 months ago

🚨 Our panel kicks off at 11:30 AM in Room 207 A–D (Level 2)! Don't miss an amazing discussion with: Ludwig Schmidt, Andrew Owens, Arsha Nagrani, and Ani Kembhavi 🔥

1

7

3

0

3K

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago · Nashville

Our MMFM Panel Discussion "What is Next in Multimodal Foundation Models?" will happen at 11:30am in room 207 A-D Moderator: Roei Herzig (UC Berkeley) Panelists: Ludwig Schmidt, Andrew Owens, Arsha Nagrani, Ani Kembhavi @MMFMWorkshop @CVPR

WeiLinCV's tweet photo. Our MMFM Panel Discussion "What is Next in Multimodal Foundation Models?" will happen at 11:30am in room 207 A-D
Moderator: Roei Herzig (UC Berkeley)
Panelists: Ludwig Schmidt, Andrew Owens, Arsha Nagrani, Ani Kembhavi
@MMFMWorkshop
@CVPR https://t.co/TpV1eJrdCt

0

2

1

0

1K

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago · Nashville

@CVPR @MMFMWorkshop

0

45

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago · Nashville

Our next invited talk "Multimodal Learning from the Bottom Up" by Andrew Owens will start at 10:30AM. Do not miss it 😁

#3 MMFM Workshop @MMFMWorkshop

12 months ago

🤩🤩🤩

MMFMWorkshop's tweet photo. 🤩🤩🤩 https://t.co/4hC5XPvf8Z

0

4

0

3K

1

0

87

WeiLinCV retweeted

#3 MMFM Workshop @MMFMWorkshop

12 months ago

Our first speaker 🥁🥁🥁 @lschmidt3

0

2

1

0

442

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago · Nashville

WeiLinCV's tweet photo. https://t.co/brJPieIns9

0

42

Wei Lin @ CVPR 2025 @WeiLinCV

12 months ago · Nashville

Ludwig Schmidt is giving the first talk on LAION-5B & DataComp: In search of the next generation of multimodal datasets at our MMFM workshop on Zoom 207 A-D!