This ridiculous. You don’t even know how to discern who is Orthodox and who is not, let alone steelman the Orthodox positions.
What makes you think that YOU are capable of debating against the deepest, most fleshed out, most coherent, most biblical 2000 year old theology to ever appear in history? What makes you think that repeating your heresies against it will cause a different result than when they were first raised against the Church?
Do you understand the immense, infinite amount of ignorance combined with infinite arrogance this is?
What is wrong with you people?
You understand NOTHING. Yet you feel ready to debate??? This is beyond tragic for you man
@JPuncut Imagine being a a couple of twelve year old kids with severely impaired iq.
Now, imagine that these kids think they refuted 2000 years of the depest and moat coherent theology ever to appear in history.
Actually you don’t need to imagine it, here it is, in reality
@NewsNFTU What’s stopping you from joining a clearly canonical jurisdiction and calling the truths you care about?
I am canonical and I freely call what is wrong, wrong.
It’s the “jurisdiction” part you don’t like.
The gospel alone is what leads people to saving faith in Christ alone, not tradition and rituals.
Romans 1
16 For I am not ashamed of the gospel, for it is the power of God for salvation to everyone who believes, to the Jew first and also to the Greek.
One thing that I'm testing right now is the Diffusion Gemma 26b model. #LLM#localAI#AI
This is google's blog post for the model:
https://t.co/wasTsyKyfb
I am actually surprised that not more GB10 users do not talk about this one.
The thing is, that I was pleasantly surprised when I saw that specifically on the DGX Spark, it is way faster than 4x.
Why is this significant: Simple: If this can handle real world heat, then you have a full 26b parameter DENSE model with Vision and tool capability that you can really use in very usable speeds locally on your single DGX spark or clone. And it can do it at maxed out context size and you can even do inference with decent concurrency and even have some spare memory.
If configured correctly on the GB10 and on an agent like opencode it is BLAZING fast. 100-200 token/sec for a 26b model and depending on the situation. Also vision analysis at blazing speeds as well.
The thing is, this model is NOT an AR model, so it needs to be setup properly many agents expect a stream. The reason is that diffusion models spit out large ready (rendered/formed) chucks in one go. This seems to mess most agents up. Trying to set it up like a normal model did not work for me. however using the settings below had me impressed, as it suddenly because very usable and FAST. (Faster even than some online SOTA models).
I'm still trying to see if it's stable enough for daily use.
Here's the recommended setup for vllm:
vllm serve google/diffusiongemma-26B-A4B-it \
--max-model-len 262144 \
--max-num-seqs 4 \
--gpu-memory-utilization 0.80 \
--generation-config vllm \
--enable-chunked-prefill \
--attention-backend TRITON_ATTN \
--hf-overrides '{"diffusion_sampler":"entropy_bound","diffusion_entropy_bound":0.1}' \
--diffusion-config '{"canvas_length":256}' \
--enable-auto-tool-choice \
--tool-call-parser gemma4 \
--reasoning-parser gemma4 \
--chat-template examples/tool_chat_template_gemma4.jinja \
--mm-processor-kwargs '{"max_soft_tokens":1120}' \
--limit-mm-per-prompt '{"image":4}' \
--host 0.0.0.0 --port 8099
Note the --chat-template examples/tool_chat_template_gemma4.jinja argument which ensures that it'll behave like a normal Gemma 4 model when it comes to tool usage.
and here's the settings for opencode that seem to be the best:
"gb10_diffusion": {
"npm": "@ai-sdk/openai-compatible",
"name": "GB10 Diffusion (local)",
"options": {
"baseURL": "http://192.168.178.169:8099/v1",
"apiKey": "local",
"timeout": 600000,
"chunkTimeout": 120000
},
"models": {
"google/diffusiongemma-26B-A4B-it": {
"name": "DiffusionGemma 26B BF16",
"tool_call": true,
"reasoning": false,
"limit": {
"context": 262144,
"output": 16384
}
}
}
},
Note I use the full precision model simply to make sure it's as stable as possible.
With this bf16 model and memory utilization at 0.8 I get ~6.50x at full context length (256K) on a single GB10.
I'm currently seeing if temperature set to 0.0 is going to yield more accurate code. FP8 and NVFP4 variations also work seemingly fine, but there is a detectable improvement when using the bf16 in my subjective opinion.