@stevibe@TeksEdge I have no problem getting 500-700 tok/s on Dense 30B models on my 5090 and 300 tok/sec on my 3090.
Its not so much the quant as it is compiling Llama.cpp for your specific hardware. This pic is Gemma4 running their dense 31b at 500+ tok/sec.
@BuescherScott Yo. That's wild. I recently formed a Real estate tech company with a partner who owns a realtor company in South Florida. We're publicly launching soon.
I'll follow you back and check out your project.