@antirez How are you serving it/runtime? I’m only getting 13 tok per sec on my M1 Ultra 128gb & it’s driving me nuts. Can’t fix it. I’m using JANGTQ2
@dealignai Hey brother I’ve got a M1 Ultra 128GB & I can only run your DSV4 JANGTQ2 at like 13 tok/sec… I’ve tried everything. What runtime are you using & do you have any advice? I know it should be over 40+ tok/sec.
@jun_song I’ve got a M1 Ultra 128GB & I can only run DSV4 Flash at like 13 tok/sec… I’ve tried everything. How are you serving yours & do you have any advice? I know it should be over 40+ tok/sec
@Youssofal_ Really ground breaking stuff man. Next step is matching llama.cpp’s consistency for agentic tool calls/stability. Would love to run this consistently on Hermes/Openclaw