@ljupc0@MrPeterLMorris@antirez@0xSero It doesn't work out of the box, but needs fairly lightweight patch to run it. The difference is topology: REAP models have different number of experts and ds4 must be aware of it.
I am running DeepSeek V4 Flash on Apple M2 Max 64GB locally.
Thanks @antirez for DwarfStar and thanks @0xSero for his smaller REAP checkpoint of DeepSeek V4 Flash.
@k3ntosan@antirez@0xSero In general, quantization error is washed out as the number of parameters increases. But REAP also contributes. So yeah, it might be frustrating sometimes, but still, unbeatable in many aspects.
@Edouardmazza@antirez@0xSero Aside from downloading a gguf, all I did to support it was ask Codex to add the custom model topology, which has fewer experts than the original. It then built and ran without a hitch.
@WhatsTrue8@antirez@0xSero No, unless @0xSero would cook half the size of the smallest REAP. It doesn't make much sense, though. You can totally run a dense model that fits in your machine and definitely would be a better choice for you, like Qwen 3.6 27B in 8 bit.
@synapticity@antirez@0xSero Just ran ds4eval. Original Q2 results are listed in repo and its full pass.
However, this one fails two out of four.
ds4-eval: 2/4 passed, 2 failed, runtime 00h:11m
@0xSero@trashpandaemoji@antirez Fine by me, but, to be honest, it might become a blocker for naturally multilingual use cases, such as creative writing, RP, and document processing kind of work.