Tesztvezetés a legújabb, 0.11-es openpilot verzióval. Ha tetszett, és szeretnél te is az autódba egy comma 4-est, rendeld meg az https://t.co/CKEDdU5ute oldalon! @comma_ai#openpilot#comma4
🧵I wrote a simple matmul kernel (link in comments) in pydawn, using a bleeding edge #webgpu feature currently hidden behind a chromium experimental flag. The subgroupMatrix feature exposes Metal's simdgroup functionality. But since Metal is not the only backend webgpu supports, you can query the adapter to get the available subgroupMatrix configurations.
@dogecahedron@__tinygrad__ In terms of extra operations, you can see that instead of performing just a load/store, you also have to perform the indexing modifications (divide for word select, modulo for byte select), and on top of that, you have to shift and mask. We haven't measure the exact perf penalty.