Helmut Schmidt

@helmutkan

Hamburg, Germany

Joined December 2015

115 Following

11 Followers

66 Posts

Helmut Schmidt @helmutkan

30 days ago

@stevibe Thank you. You probably saw it already, just for others PR is merged and one could just use "vllm/vllm-openai:nightly" image. If you have resources it would be nice to compare google's MTP with RedHatAI/gemma-4-31B-it-NVFP4 as verifier model against original google model

0

0

0

1

394

Helmut Schmidt @helmutkan

about 1 month ago

@bnjmn_marie Thank you. How does it compare with eagle3 speculation, have you tried it? Do you mind sharing yours exact vllm command?

1

0

0

0

386

Helmut Schmidt @helmutkan

about 1 month ago

@jun_song "Dgx spark"-like but with 256gb and 400-500gbs memory bandwidth would be almost perfect...

0

2

0

0

678

Helmut Schmidt @helmutkan

about 1 month ago

@CodyKnowsCode @vllm_project With the latest changes from vllm. My understanding is you were getting 35tok/s with old vllm and that you expect better results with this 0.20 version of vllm?

1

0

0

0

52

Helmut Schmidt @helmutkan

about 1 month ago

@CodyKnowsCode @vllm_project Could you, please, share results with the latest changes?

1

0

0

0

59

Helmut Schmidt @helmutkan

about 1 month ago

@RedHat_AI Got the response/explanation here: https://t.co/sFOqVtF2H0

0

1

0

0

48

Helmut Schmidt @helmutkan

about 1 month ago

@RedHat_AI thank you. Whatever I do I just cannot pass the attention backend error "Selected backend AttentionBackendEnum.FLASH_ATTN is not valid for this configuration. Reason: ['partial multimodal token full attention not supported']" vllm-0.20.0 transformers-5.6.2 speculators-0.5.0

1

1

0

0

511

Helmut Schmidt @helmutkan

about 1 month ago

@johnny_everson nope, just using "trusted" models from google, redhat, nvidia, etc. and in safetensors format

0

1

0

0

26

Helmut Schmidt @helmutkan

about 1 month ago

@jun_song less than 0,12 Euros per 1 kWh? Damn...

0

0

0

0

50

Helmut Schmidt @helmutkan

about 1 month ago

@johnny_everson @stevibe yeah, me too, just my personal feeling that gemma4 dense model is way faster and better... Trying to make it work with dflash speculative model, but just cannot make it work at the moment...

2

0

0

0

48

Helmut Schmidt @helmutkan

about 1 month ago

@RedHat_AI @vllm_project "Validated for H100" - does this mean it depends on GPU architecture? Tried to make it work on dgx spark but with no success regardless of verifier model and type of attention backend...

0

0

0

0

66

Helmut Schmidt @helmutkan

about 1 month ago

@dsikka84 @BitcoinComfy @RedHat_AI @vllm_project I tried it with both fp8 and nvfp4, but couldn't make it work with containerised vllm nightly on dgx spark at all... Since it's mentioned it's verified only on H100, does that means its dependable on GPU architecture?

0

0

0

0

92

Helmut Schmidt @helmutkan

about 2 months ago

@stevibe Oh I see, didn't know it's moe, thanks!

0

1

0

0

248

helmutkan retweeted

about 2 months ago

I'm amazed none of these videos are picked up by YouTube's 'Likeness Detection' feature (though I guess, that's why it's in Beta?). cc @YouTubeInsider @YouTubeCreators (Yes, all of these videos contain my AI-cloned likeness and cloned voice. I submitted a 'privacy violation'.)

geerlingguy's tweet photo. I'm amazed none of these videos are picked up by YouTube's 'Likeness Detection' feature (though I guess, that's why it's in Beta?). cc @YouTubeInsider @YouTubeCreators

(Yes, all of these videos contain my AI-cloned likeness and cloned voice. I submitted a 'privacy violation'.) https://t.co/w6jy8w5Gzy

49

674

31

30

34K

Helmut Schmidt @helmutkan

about 2 months ago

@bnjmn_marie "The model is as token-efficient as the original one." Is this implying that other nvfp4 models in this list are not token efficient?

1

0

0

0

261

Helmut Schmidt @helmutkan

about 2 months ago

@RedHat_AI @dsikka84 Thank you and thanks to @dsikka84 and team. I probably could do some simple and quick tests if someone points me what, where and how - which is I guess more time consuming then running those tests :) thanks again

0

0

0

0

54

Helmut Schmidt @helmutkan

about 2 months ago

@RedHat_AI It would be interesting to see results of speculator model RedHatAI/gemma-4-31B-it-speculator.eagle3 with RedHatAI/gemma-4-31B-it-NVFP? Any possibility to do that?

1

2

0

0

227

Helmut Schmidt @helmutkan

about 2 months ago

@protect_whales @bnjmn_marie +1, additional to that list I would include two from Nvidia (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 and nvidia/Nemotron-Cascade-2-30B-A3B)

0

1

0

0

51

Last Seen Users on Sotwe

Trends for you

Most Popular Users