Patrick Devine @pdev110 - Twitter Profile

Pinned Tweet

over 5 years ago

For your pandemic Friday viewing enjoyment. 'kubectl run -it --rm --image=https://t.co/y1ra99OfKZ tif' or for the docker inclined 'docker run -it --rm https://t.co/y1ra99OfKZ'

1

9

1

2

0

Patrick Devine

@pdev110

about 5 hours ago

@Nanka696 @LyalinDotCom Only for macOS right now unfortunately for the nvfp4 weights. We're getting closer for other platforms. On Windows you can use `gemma4:12b-it-qat` which should given similar results.

0

42

pdev110 retweeted

ollama

@ollama

about 5 hours ago

Gemma 4 Quantization-Aware Training (QAT) weights are now available on Ollama! They reduce memory requirements while maintaining model quality. E2B: ollama run gemma4:e2b-it-qat E4B: ollama run gemma4:e4b-it-qat 12B: ollama run gemma4:12b-it-qat 26B: ollama run gemma4:26b-a4b-it-qat 31B: ollama run gemma4:31b-it-qat Try them with ollama launch integrations to use with your favorite tools 👇👇👇

11

453

50

235

29K

Patrick Devine

@pdev110

about 23 hours ago

@sonriks6 @LyalinDotCom @ollama It's back! We had some hiccups on rollout.

0

1

0

33

Who to follow

@ollama – prev @docker, @twitter, @google

Suzanne

@spanoplos

Team @trufflesec ex @Docker | Family, Tech, Biking/Hiking, @SFGiants #dubnation | Chicago native - San Francisco transplant @supanop.bsky.social

Patrick Devine

@pdev110

about 24 hours ago

@synthshareai @ai_for_success That particular model is tuned with Nvidia's model optimizer for nvfp4. I'm still running MMLU-Pro right now to test it out and it's getting 74.6% accuracy (at about 15% completed). Google published it at 77.2% for BF16s.

0

1

0

26

Patrick Devine

@pdev110

about 24 hours ago

@drashyakuruwa @ai_for_success They're available now for non-Macs too.

0

1

0

16

Patrick Devine

@pdev110

1 day ago

@ivanfioravanti I'm wondering if they would notice the difference w/ bf16 vs. mxfp8. The microscaling formats seem really decent. nvfp4 is surprisingly good if you tune it correctly.

0

1

0

97

Patrick Devine

@pdev110

1 day ago

@LyalinDotCom I've been testing it w/ MMLU-Pro and am about 1/5 of the way through and getting 74.3% accuracy which is pretty impressive.

0

2

0

156

Patrick Devine

@pdev110

1 day ago

@LyalinDotCom For Ollama I'd recommend the `gemma4:12b-nvfp4` model which is tuned with Nvidia's model optimizer. I realized though that I should have quantized the qkvo attention tensors to nvfp4 (they're at mxfp8) and I have some tweaks for the embedding layer.

2

0

478

Patrick Devine

@pdev110

1 day ago

@ivanfioravanti it's the MTP tensors (Qwen3next packs these directly in with the model instead of a separate repo).

1

0

201

Patrick Devine

@pdev110

10 days ago

@LyalinDotCom For the Mac try `ollama run gemma4:31b-mlx` which will give you significantly better performance. For the DGX Spark you can get a significant performance boost with Ollama 0.30.0 which is just about to come out (it's in prerelease).

0

1

0

1

173

Patrick Devine

@pdev110

26 days ago

@yoeven On your Mac you should run `gemma4:e4b-nvfp4` and you should get a pretty big speed bump over `gemma4:e4b`. I realize the model names are confusing, but we are trying to make this easier!

1

0

2

191

Patrick Devine

@pdev110

28 days ago

I think almost all of the MTP/DFlash demos I've been seeing over the last few weeks have been using simple greedy sampling. That's great if you can live with temperature = 0, but I think most people want more sampling options.

0

110

Patrick Devine

@pdev110

about 1 month ago

@D_Twitt3r @ollama We haven't yet shipped the vision or audio parts for the MLX engine. They are coming though!

1

0

1

59

pdev110 retweeted

ollama

@ollama

about 1 month ago

DeepSeek v4 Pro is now on Ollama's cloud! 🚀🚀🚀 Try it with Claude Code: ollama launch claude --model deepseek-v4-pro:cloud Try it with Hermes Agent: ollama launch hermes --model deepseek-v4-pro:cloud Chat with the model: ollama run deepseek-v4-pro:cloud 🧵

ollama's tweet photo. DeepSeek v4 Pro is now on Ollama's cloud! 🚀🚀🚀

Try it with Claude Code:
ollama launch claude --model deepseek-v4-pro:cloud

Try it with Hermes Agent:
ollama launch hermes --model deepseek-v4-pro:cloud

Chat with the model:
ollama run deepseek-v4-pro:cloud

🧵

120

2K

178

315

105K

Patrick Devine

@pdev110

about 1 month ago

@ivanfioravanti Maybe try Ollama w/ `qwen3.6:27b-coding-nvfp4`? That's the MLX runner variant which is less quantized than the affine 4 bit integer quants, and it has the hyper-parameters set for coding/agentic use cases.

1

0

162

Patrick Devine

@pdev110

about 1 month ago

@dannytt @julien_c @huggingface With ollama, make sure you're using `qwen3.6:27b-coding-nvfp4`. For generation on an M5 I get just about 30 toks/sec, but the real magic is in the prefill speeds and the LRU cache.

1

8

1

10

443

pdev110 retweeted

ollama

@ollama

about 1 month ago

deepseek-v4-flash is now available on Ollama's cloud! Hosted in the US. Try it with Claude Code: ollama launch claude --model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw --model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes --model deepseek-v4-flash:cloud Try it with chat: ollama run deepseek-v4-flash:cloud (DeepSeek V4 Pro is coming shortly) 🧵

88

2K

150

387

151K

Patrick Devine

@pdev110

about 1 month ago

@iansltx @ollama The `coding` tags just have the recommended hyperparameters set for coding/agentic use. They share the same weights (it doesn't take up extra disk space).