@Prince_Canuma@jtdavies I will do that. The generation looks like this: ir generates some words, then it stops, and them again… the intervalls are like every second. The power usage is 40-60w for the whole system(very very low) and gpu utilization max 80%
@ivanfioravanti I also got that feeling. Still, I saw some posts that the ds4 was running on M5 Max with like over 35 t/s, which is way higher than on the antirez ds4. If you could tell me where I can find that fork of the mlx_lm, I would very much appreciate it
@ivanfioravanti using deepseek v4 flash with ca 50K tokens I was able to patch the mlx_vlm so that I can run osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx . this is just awesome