@0xSero Is it worth saving on RTX 6000 and go with DGX? For agents, tuning, experimenting.
Been renting 6000 and debating if mem bandwidth is okay trade off
Revived my old 2080ti, loaded Gemma 3 12B with llama.cpp. This is where we are at:
prompt eval time = 76.59 ms / 17 tokens ( 4.51 ms per token, 221.96 tokens per second)
eval time = 32043.16 ms / 1598 tokens ( 20.05 ms per token, 49.87 tokens per second)
total time = 32119.75 ms / 1615 tokens