Matthew Walmer @MatthewWalmer - Twitter Profile

5 days ago

We’re looking forward to presenting UPLiFT at #CVPR2026! Efficiently extract pixel-dense features from pretrained backbones like DINOv3. We’ll be at the final poster session on Sunday (6/7) from 3:30-5:30pm at Poster 474, so please come by! Website: https://t.co/Wng5ZfzjGA

MatthewWalmer's tweet photo. We’re looking forward to presenting UPLiFT at #CVPR2026! Efficiently extract pixel-dense features from pretrained backbones like DINOv3.

We’ll be at the final poster session on Sunday (6/7) from 3:30-5:30pm at Poster 474, so please come by!

Website: https://t.co/Wng5ZfzjGA https://t.co/SOqN6qF5lq

0

73

10

30

7K

MatthewWalmer retweeted

Soumik Mukhopadhyay @ CVPR26 @soumikkanad

3 months ago

Diffusion models be like: “this image is 97% noise… better process all 256×256 pixels anyway” If very noisy diffusion states contain no more useful information than a tiny downsampled image, Then why run expensive full-res computation on them? 🧵

soumikkanad's tweet photo. Diffusion models be like:
“this image is 97% noise… better process all 256×256 pixels anyway”
If very noisy diffusion states contain no more useful information than a tiny downsampled image,
Then why run expensive full-res computation on them?
🧵 https://t.co/IyOM5Vgp5a

2

25

7

5

7K

Matthew Walmer @MatthewWalmer

4 months ago

Excited to announce that UPLiFT has been accepted to #CVPR2026! You can also try out UPLiFT right now to extract pixel-dense DINOv3 features with our pretrained models linked below! Code: https://t.co/slgetOBNPO Paper: https://t.co/9IqMewyZeG Website: https://t.co/MJ78gJpXAJ

MatthewWalmer's tweet photo. Excited to announce that UPLiFT has been accepted to #CVPR2026!

You can also try out UPLiFT right now to extract pixel-dense DINOv3 features with our pretrained models linked below!

Code: https://t.co/slgetOBNPO
Paper: https://t.co/9IqMewyZeG
Website: https://t.co/MJ78gJpXAJ https://t.co/tt5aixuISp

2

148

28

83

7K

Matthew Walmer @MatthewWalmer

5 months ago

@Minseok96_kr @_sakshams_ @AnirudAgg @abhi2610 Hi Minseok, UPLiFT operates on the VAE's latent features similarly to the DINO features. We do sample the features first, essentially using the features as they would be fed to the VAE decoder later.

1

2

0

277

Who to follow

Saksham Suri ✈️ CVPR

@_sakshams_

Research Scientist @AiatMeta. Previously PhD @UMDCS, @MetaAI, @AmazonScience, @USCViterbi, @IIITDelhi, @IBMResearch. #computervision #deeplearning

Hossein Souri

@HosseinSouri8

Senior AI Researcher at @Samsung_RA. CS PhD at @JohnsHopkins, MS at @UofMaryland.

Anas Mahmoud

@nas_mahmoud_

Post-training Research @ScaleAILabs | Prev Research @mila_quebec | Research Intern @Meta FAIR | PhD @UofT

Matthew Walmer @MatthewWalmer

5 months ago

We’re excited to announce UPLiFT, our lightweight, pixel-dense feature upsampler. UPLiFT boosts feature density, preserves semantics, and has better efficiency scaling than recent SOTA methods. See all links in the thread below. Coauthors: @_sakshams_ @AnirudAgg @abhi2610 🧵[1/6]

MatthewWalmer's tweet photo. We’re excited to announce UPLiFT, our lightweight, pixel-dense feature upsampler. UPLiFT boosts feature density, preserves semantics, and has better efficiency scaling than recent SOTA methods. See all links in the thread below.
Coauthors: @_sakshams_ @AnirudAgg @abhi2610
🧵[1/6] https://t.co/kjwnKzaGkt

8

392

52

244

19K

Matthew Walmer @MatthewWalmer

5 months ago

@_sakshams_ @AnirudAgg @abhi2610 In addition, UPLiFT + SD1.5 VAE achieves comparable visual quality to the state-of-the-art method FM-Boost (CFM), while using less training data, few parameters, and fewer inference-time iterations. 🧵[6/6]

MatthewWalmer's tweet photo. @_sakshams_ @AnirudAgg @abhi2610 In addition, UPLiFT + SD1.5 VAE achieves comparable visual quality to the state-of-the-art method FM-Boost (CFM), while using less training data, few parameters, and fewer inference-time iterations.
🧵[6/6] https://t.co/6rXgTrPIf9

1

4

0

2

573

Matthew Walmer @MatthewWalmer

5 months ago

@_sakshams_ @AnirudAgg @abhi2610 We demonstrate the versatility and effectiveness of UPLiFT for both predictive and generative tasks, including semantic segmentation, depth estimation, image super-resolution, and efficient T2I generation. 🧵[5/6]

MatthewWalmer's tweet photo. @_sakshams_ @AnirudAgg @abhi2610 We demonstrate the versatility and effectiveness of UPLiFT for both predictive and generative tasks, including semantic segmentation, depth estimation, image super-resolution, and efficient T2I generation.
🧵[5/6] https://t.co/ZfESw4cjhN

0

10

0

2

570

Matthew Walmer @MatthewWalmer

5 months ago

@_sakshams_ @AnirudAgg @abhi2610 Through this approach, our method maintains linear-time-scaling with respect to the number of visual tokens. Meanwhile, cross-attention-based upsamplers have quadratic scaling. This allows UPLiFT to scale and make denser features for larger images. 🧵[4/6]

MatthewWalmer's tweet photo. @_sakshams_ @AnirudAgg @abhi2610 Through this approach, our method maintains linear-time-scaling with respect to the number of visual tokens. Meanwhile, cross-attention-based upsamplers have quadratic scaling. This allows UPLiFT to scale and make denser features for larger images.
🧵[4/6] https://t.co/riEGEpoV1m

0

7

0

2

612

Matthew Walmer @MatthewWalmer

5 months ago

@_sakshams_ @AnirudAgg @abhi2610 UPLiFT uses iterative feature growing, which avoids the high computational costs of recent cross-attention-based methods. We also present a new Local Attender feature-pooling module, which reformulates local attention using operations based on relative directional offsets 🧵[3/6]

MatthewWalmer's tweet photo. @_sakshams_ @AnirudAgg @abhi2610 UPLiFT uses iterative feature growing, which avoids the high computational costs of recent cross-attention-based methods. We also present a new Local Attender feature-pooling module, which reformulates local attention using operations based on relative directional offsets
🧵[3/6] https://t.co/Cg1MIFPibP

0

6

0

2

738

Matthew Walmer @MatthewWalmer

5 months ago

Today we are also releasing our UPLiFT code and 3 pretrained models for DINOv2-S/14, DINOv3-S+/16, and SD1.5 VAE. We also include torch hub support and training code. Paper: https://t.co/9IqMewyZeG Code: https://t.co/aAqKP7LKCM Website: https://t.co/MJ78gJpXAJ 🧵[2/6]

MatthewWalmer's tweet photo. Today we are also releasing our UPLiFT code and 3 pretrained models for DINOv2-S/14, DINOv3-S+/16, and SD1.5 VAE. We also include torch hub support and training code.
Paper: https://t.co/9IqMewyZeG
Code: https://t.co/aAqKP7LKCM
Website: https://t.co/MJ78gJpXAJ
🧵[2/6] https://t.co/6ZsRDt8MKt

0

17

2

11

1K

MatthewWalmer retweeted

Pulkit @pulkitkumar95

10 months ago

🎉 Excited to share our paper "Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition" has been accepted to #ICCV2025! Equally co-led with @ShuaiyiH — we advance few-shot action recognition via smart point tracking. 🔗 https://t.co/449JI1WiL4 🧵👇

pulkitkumar95's tweet photo. 🎉 Excited to share our paper "Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition" has been accepted to #ICCV2025!

Equally co-led with @ShuaiyiH — we advance few-shot action recognition via smart point tracking.

🔗 https://t.co/449JI1WiL4
🧵👇 https://t.co/0x6yB6JETZ

6

144

26

73

11K

MatthewWalmer retweeted

Saksham Suri ✈️ CVPR @_sakshams_

over 1 year ago

We are happy to release our LiFT code and pretrained models! 📢 Code: https://t.co/vtZSkw1SNs Project Page: https://t.co/dk21eeDHpi Here are some super spooky super resolved feature visualizations to make the season scarier 🎃 Coauthors: @MatthewWalmer @kamalgupta09 @abhi2610

_sakshams_'s tweet photo. We are happy to release our LiFT code and pretrained models! 📢

Code: https://t.co/vtZSkw1SNs
Project Page: https://t.co/dk21eeDHpi

Here are some super spooky super resolved feature visualizations to make the season scarier 🎃

Coauthors: @MatthewWalmer @kamalgupta09 @abhi2610 https://t.co/w3BB3KOPJj

2

230

43

93

15K

MatthewWalmer retweeted

Saksham Suri ✈️ CVPR @_sakshams_

over 1 year ago

We introduce LiFT, an easy to train, lightweight, and efficient feature upsampler to get dense ViT features without the need to retrain the ViT. Visit our poster @eccvconf #eccv2024 in Milan on Oct 1st (Tuesday), 16:30 (local), Poster: 79. Project Page: https://t.co/dk21eeEfeQ

_sakshams_'s tweet photo. We introduce LiFT, an easy to train, lightweight, and efficient feature upsampler to get dense ViT features without the need to retrain the ViT.

Visit our poster @eccvconf #eccv2024 in Milan on Oct 1st (Tuesday), 16:30 (local), Poster: 79. Project Page: https://t.co/dk21eeEfeQ https://t.co/Bu2kpLvpG4

6

926

146

476

63K

Matthew Walmer @MatthewWalmer

almost 3 years ago

Just a reminder we’ll be presenting this evening at the Tuesday 4:30pm poster session at #CVPR2023. Hope to see you there!

Matthew Walmer @MatthewWalmer

almost 3 years ago

We’re looking forward to presenting our work “Teaching Matters: Investigating the Role of Supervision in Vision Transformers” next week at #CVPR2023! We’ll be in the Tues-PM poster session at board 321. Links and some key results below. @_sakshams_ @kamalgupta09 @abhi2610 [1/5]

4

7

1

4

3K

0

1

0

113

Matthew Walmer @MatthewWalmer

almost 3 years ago

@_sakshams_ @kamalgupta09 @abhi2610 The best layer for a downstream task varies depending on both the task and the pretraining. For example, on keypoint correspondence, most of the ViTs have their best performance with layers 7 or 8 (of 12). We present comparisons for both locally and globally focused tasks. [5/5]

MatthewWalmer's tweet photo. @_sakshams_ @kamalgupta09 @abhi2610 The best layer for a downstream task varies depending on both the task and the pretraining. For example, on keypoint correspondence, most of the ViTs have their best performance with layers 7 or 8 (of 12). We present comparisons for both locally and globally focused tasks.
[5/5] https://t.co/EOqckU1DOy

0

3

0

108

Matthew Walmer @MatthewWalmer

almost 3 years ago

We’re looking forward to presenting our work “Teaching Matters: Investigating the Role of Supervision in Vision Transformers” next week at #CVPR2023! We’ll be in the Tues-PM poster session at board 321. Links and some key results below. @_sakshams_ @kamalgupta09 @abhi2610 [1/5]

4

7

1

4

3K

Matthew Walmer @MatthewWalmer

almost 3 years ago

@_sakshams_ @kamalgupta09 @abhi2610 Even though MAE has no CLS objective, we find evidence that it learns to embed semantic information in the CLS token even before fine-tuning. Through CKA analysis, we find some similarity between MAE, DINO, and MoCo CLS token representations. [4/5]

MatthewWalmer's tweet photo. @_sakshams_ @kamalgupta09 @abhi2610 Even though MAE has no CLS objective, we find evidence that it learns to embed semantic information in the CLS token even before fine-tuning. Through CKA analysis, we find some similarity between MAE, DINO, and MoCo CLS token representations.
[4/5] https://t.co/mLdDut5vhT

0

3

0

115

Matthew Walmer @MatthewWalmer

almost 3 years ago

@_sakshams_ @kamalgupta09 @abhi2610 Did you know that ViTs learn to use offset local attention heads? These heads attend locally, but to a position that is one off in one direction. The existence of these heads may actually demonstrate a strength of CNNs over ViTs. [3/5]

MatthewWalmer's tweet photo. @_sakshams_ @kamalgupta09 @abhi2610 Did you know that ViTs learn to use offset local attention heads? These heads attend locally, but to a position that is one off in one direction. The existence of these heads may actually demonstrate a strength of CNNs over ViTs.
[3/5] https://t.co/sms8Jxycvg

0

3

0

76

Matthew Walmer @MatthewWalmer

almost 3 years ago

@_sakshams_ @kamalgupta09 @abhi2610 We compared ViTs from 6 different supervision methods and identified key similarities and differences between them. We examine: attention, features, and downstream performance. Paper: https://t.co/M9ju6fquDy Website: https://t.co/NrlUcuP3tR Code: https://t.co/TJ756gnP95 [2/5]

MatthewWalmer's tweet photo. @_sakshams_ @kamalgupta09 @abhi2610 We compared ViTs from 6 different supervision methods and identified key similarities and differences between them. We examine: attention, features, and downstream performance.

Paper: https://t.co/M9ju6fquDy

Website: https://t.co/NrlUcuP3tR

Code: https://t.co/TJ756gnP95
[2/5] https://t.co/DbnNvg80kd

0

3

0

112

MatthewWalmer retweeted

Saksham Suri ✈️ CVPR @_sakshams_

over 3 years ago

Excited to share our work "Teaching Matters: Investigating the Role of Supervision in Vision Transformers" which has been accepted to #CVPR2023! Work done with: @MatthewWalmer, @kamalgupta09 and @abhi2610 Website: https://t.co/EXdRiIxig6 Code: https://t.co/tptWMo8BFN

_sakshams_'s tweet photo. Excited to share our work "Teaching Matters: Investigating the Role of Supervision in Vision Transformers" which has been accepted to #CVPR2023!
Work done with: @MatthewWalmer, @kamalgupta09 and @abhi2610
Website: https://t.co/EXdRiIxig6
Code: https://t.co/tptWMo8BFN https://t.co/IDFNuSpeGh

0

68

8

14

4K

Matthew Walmer

@MatthewWalmer

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users