@benhylak@pronounced_kyle I just finished the part where he puts number theory into a formal system, around page 230. Itโs getting tough to keep going but Iโm on a mission
@cmuratori "Loading" and "updating the screen" don't seem all that related anyway? Like you can have a game giving you a load screen and updating the loading spinner at 60 FPS if it's really loading GBs from your hard drive.
@awesomekling EU nutrition labels drive me nuts. The per 100g system is obviously inferior. Like suppose I want to have a protein bar. I care how much is in the protein bar, not in a glob of 2.37 protein bars
OK last post for the night: I tried all the fancy stuff they recommended in their GEMM doc: Z-curve, static extents, accumulation group synchronization. None of it seemed to make any performance improvement - I seem to be stuck at 40 TFLOPs in bf16 across a variety of shapes.
I got my M5 MacBook over the weekend and had some time to mess around with Metal 4 and the Neural Accelerators!
Wanted to document some of my first impressions below:
@__simt__@anemll@ekryski@mweinbach How do you use the fp19? When I had my metal kernel mark the inputs as `float`, the profiler seemed to tell me it wasn't using the neural accelerator, but the normal fp32 ALUs?
I got my M5 MacBook over the weekend and had some time to mess around with Metal 4 and the Neural Accelerators!
Wanted to document some of my first impressions below:
Overall had a fun time! To close off with some criticisms:
- it took me a long time to figure out how to enable Metal 4. I wish this were better-documented
- MPP seems a little boiler-platey. I wish there were a slightly more convenient syntax for this stuff, but not a dealbreaker.
Hope this was interesting!
I was also expecting a much more dramatic speedup from the Neural Accelerator. It seemed that with my original tile size of 32x32, I was only getting 244 GB/s of memory bandwidth. Bumping it up to 64x64 gave me 740 GB/s, dropping the time to 3.36ms!