Got M5 Max to 548M tokens/s on @AlexCheema's TALOS-vs-MacBook microGPT bench (10,341× the FPGA).
Custom simdgroup_matrix Metal kernel in MLX running concurrently with multi-threaded Apple SME2 on the CPU.
Receipts:
https://t.co/r4h0bOOoVI
Got M5 Max to 548M tokens/s on @AlexCheema's TALOS-vs-MacBook microGPT bench (10,341× the FPGA).
Custom simdgroup_matrix Metal kernel in MLX running concurrently with multi-threaded Apple SME2 on the CPU.
Receipts:
https://t.co/r4h0bOOoVI