Sasha Krassovsky @bztree - Twitter Profile

@benhylak @pronounced_kyle I just finished the part where he puts number theory into a formal system, around page 230. It’s getting tough to keep going but I’m on a mission

1

0

228

Sasha Krassovsky

@bztree

about 1 month ago

@cmuratori "Loading" and "updating the screen" don't seem all that related anyway? Like you can have a game giving you a load screen and updating the loading spinner at 60 FPS if it's really loading GBs from your hard drive.

0

280

Sasha Krassovsky

@bztree

about 1 month ago

@awesomekling EU nutrition labels drive me nuts. The per 100g system is obviously inferior. Like suppose I want to have a protein bar. I care how much is in the protein bar, not in a glob of 2.37 protein bars

1

0

151

Sasha Krassovsky

@bztree

about 2 months ago

@filpizlo @zuhaitz_dev My mind was blown reading the C++ FQA for the first time. C++ really is just reinventing every C feature in its own way.

0

81

Sasha Krassovsky

@bztree

about 2 months ago

OK last post for the night: I tried all the fancy stuff they recommended in their GEMM doc: Z-curve, static extents, accumulation group synchronization. None of it seemed to make any performance improvement - I seem to be stuck at 40 TFLOPs in bf16 across a variety of shapes.

Sasha Krassovsky

@bztree

about 2 months ago

I got my M5 MacBook over the weekend and had some time to mess around with Metal 4 and the Neural Accelerators! Wanted to document some of my first impressions below:

3

235

7

219

58K

2

21

1

8

3K

Sasha Krassovsky

@bztree

about 2 months ago

@__simt__ @anemll @ekryski @mweinbach How do you use the fp19? When I had my metal kernel mark the inputs as `float`, the profiler seemed to tell me it wasn't using the neural accelerator, but the normal fp32 ALUs?

0

86

Sasha Krassovsky

@bztree

about 2 months ago

I got my M5 MacBook over the weekend and had some time to mess around with Metal 4 and the Neural Accelerators! Wanted to document some of my first impressions below:

3

235

7

219

58K

Sasha Krassovsky

@bztree

about 2 months ago

@ekryski Code here - you can run `uv run test_kernels.py`. Make sure you set `MTL_CAPTURE_ENABLED=1`! https://t.co/aTAgCCRgzb

1

4

0

2

88

Sasha Krassovsky

@bztree

about 2 months ago

@mweinbach I believe that’s for allocating Tensors on the host, nothing to do with the neural accelerator

1

0

328

Sasha Krassovsky

@bztree

about 2 months ago

@mweinbach Yes! It’s a very good document. I haven’t implemented the Z-order traversal they recommend yet, but plan to.

1

7

0

2

3K

Sasha Krassovsky

@bztree

about 2 months ago

@__alpoge__ @gallabytes 🪵🪵🪵

0

1

0

33

Sasha Krassovsky

@bztree

about 2 months ago

Overall had a fun time! To close off with some criticisms: - it took me a long time to figure out how to enable Metal 4. I wish this were better-documented - MPP seems a little boiler-platey. I wish there were a slightly more convenient syntax for this stuff, but not a dealbreaker. Hope this was interesting!

bztree's tweet photo. Overall had a fun time! To close off with some criticisms:
- it took me a long time to figure out how to enable Metal 4. I wish this were better-documented
- MPP seems a little boiler-platey. I wish there were a slightly more convenient syntax for this stuff, but not a dealbreaker.
Hope this was interesting!

2

34

3

11

4K

Sasha Krassovsky

@bztree

about 2 months ago

I was also expecting a much more dramatic speedup from the Neural Accelerator. It seemed that with my original tile size of 32x32, I was only getting 244 GB/s of memory bandwidth. Bumping it up to 64x64 gave me 740 GB/s, dropping the time to 3.36ms!

bztree's tweet photo. I was also expecting a much more dramatic speedup from the Neural Accelerator. It seemed that with my original tile size of 32x32, I was only getting 244 GB/s of memory bandwidth. Bumping it up to 64x64 gave me 740 GB/s, dropping the time to 3.36ms! https://t.co/UG7p3U7L3y

1

19

2

4

5K

Sasha Krassovsky

@bztree

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users