Vladimir Vlejd Macko @vlejd - Twitter Profile

Pinned Tweet

7 months ago

Unstructured weight #sparsity made practical. 50% unstructured weight sparsity was considered too low for real GPU speed up without specific hardware support (like @cerebras). With @bozavlado we built MACKO-SpMV - a new matrix format + SpMV kernel to change that. 🧵

vlejd's tweet photo. Unstructured weight #sparsity made practical.
50% unstructured weight sparsity was considered too low for real GPU speed up without specific hardware support (like @cerebras).
With @bozavlado we built MACKO-SpMV - a new matrix format + SpMV kernel to change that. 🧵 https://t.co/KlIE213wi8

4

27

6

7

6K

vlejd retweeted

Vlado Boza

@bozavlado

about 1 month ago

Now, accepted to ICML as a spotlight paper. I am super proud of my student @vlejd.

1

51

4

7

4K

Vladimir Vlejd Macko

@vlejd

about 1 month ago

@0xJayHK @cerebras @bozavlado tldr: optimize memory while keeping the compression compatible with the GPU programming model. The first step ofc is to figure out, what is the GPU programming model :D

0

122

Vladimir Vlejd Macko

@vlejd

7 months ago

Unstructured weight #sparsity made practical. 50% unstructured weight sparsity was considered too low for real GPU speed up without specific hardware support (like @cerebras). With @bozavlado we built MACKO-SpMV - a new matrix format + SpMV kernel to change that. 🧵

4

27

6

7

6K

Who to follow

Tweets about SaaS content marketing, building community and my entrepreneurship journey. Founder of Ignite My Site, a SaaS content marketing agency.

Kamil Kwiecien

@kamil_shman

Building the future of work: ↳ https://t.co/Oaid1OiplF AI job search tools ↳ https://t.co/UfLHRnbU0K AI agents orchestration 🇵🇱 living in 🇯🇵 👋 DMs open

Vladimir Vlejd Macko

@vlejd

6 months ago

@mmaaz_98 Nice work! if you want to add support for low 20-90% sparsity, we have an implementation at https://t.co/WLHL7X8dt8

0

3

0

1

52

Vladimir Vlejd Macko

@vlejd

6 months ago

@mariyaivasileva I worked at a company that sometimes had spare GPUs.

0

121

vlejd retweeted

James Bradbury @jekbradbury

7 months ago

opus 4.5 is really good at GPU programming, but somehow it’s even better at GPU programming jokes (h/t @Si_Boehm)

20

536

46

86

84K

Vladimir Vlejd Macko

@vlejd

7 months ago

🛠️ Next step: server GPUs. If you know how to implement a minimal CUDA matvec on H100 that hits ≥95% of cuBLAS 👉 My DMs are open.

0

1

0

134

Vladimir Vlejd Macko

@vlejd

7 months ago

And yes, it translates to real LLM inference speed ups.

1

2

0

144

Vladimir Vlejd Macko

@vlejd

7 months ago

It is funny how little correct information is there about how to properly benchmark a CUDA kernel. Most papers are wrong, eval libraries are hard to inspect and even this could have a problem because it may include the kernel launch depending on clear_cache implementation

tender

@tenderizzation

7 months ago

btw, I think BackendBench just uses triton's do_bench function, which uses a very similar timing mechanism to the one exploited here and wouldn't be robust to the same side-stream shenanigans

tenderizzation's tweet photo. btw, I think BackendBench just uses triton's do_bench function, which uses a very similar timing mechanism to the one exploited here and wouldn't be robust to the same side-stream shenanigans https://t.co/vCL9Ef52Uo

2

76

2

18

15K

0

113

Vladimir Vlejd Macko

@vlejd

7 months ago

@miru_why @niklassheth @ronusedh @intology My second personal favorite is to not clean the cache between invocations, and testing only on matrices that fit the cache. You can get some truly unbelievable flops :D

1

9

0

1

2K

Vladimir Vlejd Macko

@vlejd

7 months ago

@miru_why @niklassheth @ronusedh @intology https://t.co/9mX22jfGW5

0

8

0

1

3K

Vladimir Vlejd Macko

@vlejd

7 months ago

@miru_why @niklassheth @ronusedh @intology Hahaha. I spent months debugging this. Had to fix the official torch documentation that contained the same problem in it's examples. Unfortunately, pretty common pattern.

0

31

1

2

3K

vlejd retweeted

Julian @julianboolean_

10 months ago

holy shit they found a power series solution to ALL polynomial equations!! (bypassing Galois which says you can’t solve them in radicals)

julianboolean_'s tweet photo. holy shit they found a power series solution to ALL polynomial equations!! (bypassing Galois which says you can’t solve them in radicals) https://t.co/MEtuxMefEc

32

1K

94

1K

168K

Vladimir Vlejd Macko

@vlejd

11 months ago

@ollama local app is coming up! Awesome. Local models are the future.

0

2

0

130

Vladimir Vlejd Macko

@vlejd

11 months ago

Happy birthday @ollama !

0

6

0

258

Vladimir Vlejd Macko

@vlejd

11 months ago

I do model compression and optimization. It is essential to have access to different GPUs and that would be impossible without @vast_ai . Happy to finally meet you guys at #ICML2025 . And thanks a lot for the Nintendo Switch!

vlejd's tweet photo. I do model compression and optimization. It is essential to have access to different GPUs and that would be impossible without @vast_ai . Happy to finally meet you guys at #ICML2025 . And thanks a lot for the Nintendo Switch! https://t.co/oXw51lOuNS

0

7

3

2

1K

Vladimir Vlejd Macko

@vlejd

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users