Cody Knows Code @CodyKnowsCode - Twitter Profile

9 days ago

When the stars align, Qwen3.6 27B (MTP+ngram-mod+ngram-map-kv4) can be quite fast on a DGX Spark (56t/s single concurrency). But yea, most of the time it's 16t/s, still nice to hear the toaster go brrrrr!

CodyKnowsCode's tweet photo. When the stars align, Qwen3.6 27B (MTP+ngram-mod+ngram-map-kv4) can be quite fast on a DGX Spark (56t/s single concurrency). But yea, most of the time it's 16t/s, still nice to hear the toaster go brrrrr! https://t.co/ZRcEwI4mbq

1

0

21

Cody Knows Code @CodyKnowsCode

18 days ago

@UnslothAI @Alibaba_Qwen You can now use `--reasoning off` instead of `--chat-template-kwargs`.

0

2

1K

Cody Knows Code @CodyKnowsCode

about 1 month ago

@shiny_tech @vllm_project https://t.co/AtzjboNM1u This model runs faster but has tool calling issues (with OpenCode) so I stopped using it and went back to llama.cpp. I'm going to try a few (bigger) models with NVFP4 and see which one performs well.

0

28

Cody Knows Code @CodyKnowsCode

about 1 month ago

@helmutkan @vllm_project https://t.co/AtzjboNM1u I'm getting 56t/s at 16k context in OpenCode with this model, closer to the advertised 60+ in the card. Yet I hunger for even more... ;)

0

1

0

60

Cody Knows Code @CodyKnowsCode

about 1 month ago

@helmutkan @vllm_project No, I'm getting 35t/s with this new version, older versions would crash at startup. I'm expecting more from this version because llama.cpp already gives me better performance with an MXFP4 model (and people claim vLLM is supposed to be faster).

1

0

90

Cody Knows Code @CodyKnowsCode

about 1 month ago

@helmutkan @vllm_project What do you mean "latest changes"? My setup is here: https://t.co/Lb5c5w7arf I'll try https://t.co/AtzjboNM1u next.

1

0

79

Cody Knows Code @CodyKnowsCode

about 1 month ago

MiniMax M2.7, @UnslothAI's IQ3_XXS, speed benchmark on DGX Spark. It starts out decently but speed falls of a cliff fast for any practical usage.

CodyKnowsCode's tweet photo. MiniMax M2.7, @UnslothAI's IQ3_XXS, speed benchmark on DGX Spark.

It starts out decently but speed falls of a cliff fast for any practical usage. https://t.co/9R3vgrP7Ph

0

10

Cody Knows Code @CodyKnowsCode

about 2 months ago

Qwen 3.6 35B speed benchmark - DGX Spark. @mudler_it APEX vs @UnslothAI MXFP4. APEX has better tg, MXFP4 has better pp. @mudler_it maybe a potential optimisation here? APEX I-Raft (because it FLOATs 🙃).

CodyKnowsCode's tweet photo. Qwen 3.6 35B speed benchmark - DGX Spark.
@mudler_it APEX vs @UnslothAI MXFP4.

APEX has better tg, MXFP4 has better pp.

@mudler_it maybe a potential optimisation here? APEX I-Raft (because it FLOATs 🙃). https://t.co/jhEFxu8Yi9

0

30

Cody Knows Code @CodyKnowsCode

about 2 months ago

@stevibe In practice I get 35-50t/s on DGX Spark with llama.cpp and OpenCode as it fills context up to 16k just when starting. After 100k context I usually see it making a lot more mistakes so that's my cut-off point, after which I /compact.

0

1

0

41

Cody Knows Code @CodyKnowsCode

about 2 months ago

@UnslothAI @Alibaba_Qwen Running Q4_K_XL on DGX Spark right now, getting ~50t/s. Nice speed (but not unexpected considering the size), it needs it due to how much it generates. Writing entire books while thinking!

0

1

0

24

Cody Knows Code @CodyKnowsCode

about 2 months ago

DGX Spark hard at work!

0

29

Cody Knows Code @CodyKnowsCode

about 2 months ago

Qwen3 Coder Next is the best model to run on a DGX Spark. Use @CardilloSamuel 's Opus Distil MXFP4 quant with llama.cpp, only 43GB, plenty of room for K/V cache and full context. Getting 30-45t/s OpenCode real life usage. Saved me 400$ last 2 weeks. https://t.co/b3oDqOS6v8

0

37

Cody Knows Code @CodyKnowsCode

2 months ago

I've been running Gemma-4-31B with llama.cpp on DGX Spark using E2B as a Draft. Getting ~18 t/s, compared to the baseline ~11 t/s. The secret is to set cache-type: q8_0, spec-type: ngram-mod, and keep the context to 131072 to fit in memory and not degrade too much.

0

62

Cody Knows Code

@CodyKnowsCode

Last Seen Users on Sotwe

Trends for you

Most Popular Users