@jun_song Agree! M1 Ultra even better for a slight premium, 800GB/s bandwidth, 2x GPU. 64GB is the sweet spot for quantized models + KV cache. 128GB great for ~100B MoE but GPU can’t handle dense >40B at usable speed. Best price to perf hardware right now