iceadobe — a 2.5D thinker

@iceadobe

Compulsive Hobbyist 😌/ Tech Savvy 🎧 / Wannabe Writer ✒️/ Game Developer 📚 Software Engineer 😥🧑‍💻 #FreeSpeech 🙊 #RTFM 📖 #OpenSourceFTW 🌏

Bangalore, India

Joined July 2016

25 Following

27 Followers

337 Posts

iceadobe — a 2.5D thinker @iceadobe

15 days ago

@croll83 @LefterisJP Agreed, but their right comparison is with DS4 or Kimi and not with Qwen 35B. Also requiring those top models for 90% seems like a bit of a stretch. With a good plan from those models, in coding at least, I find Qwen to be more than just a toy donkey 😜

iceadobe — a 2.5D thinker @iceadobe

16 days ago

@croll83 @LefterisJP I'm running a 4bit Qwen3.6 35B at 100k context, 100k per seq, max seq at 4, and mtp2. With this setup running two or three agents parallelly for different tasks,able to get ~70-90 t/s. Per agent streaming is smooth; Prefill so at 65k that TTFT doesn't drag at all!

iceadobe — a 2.5D thinker @iceadobe

16 days ago

I have serious doubts about running more than 30-35B models on DGX Spark... Using Nvidia vllm @ c100k & 8 seq. Able to run only 3 or 4 agents parallelly with ~110t/s on 4-bit Qwen 35B-A3B-MTP; but it occupies >100GB RAM. How are people running DS4 on their DGX?? Llama?? 🤷🏻

127

iceadobe — a 2.5D thinker @iceadobe

16 days ago

Llama.cpp adding MTP for Qwen Models maybe that is the direction the open source is going. I have yet to give a try to @AtlasInference recipe, I had concerns over high concurrency. Will it give it a try tomorrow to judge the quality on similar 35B bench.

Who to follow

Anuj Shukla

@AnujShukla10_10

📈🔥 12 years of trading expertise | Stock Options Maestro with 100% accuracy | सत्यं हि परमं नास्ति, सत्ये धर्मः प्रतिष्ठितः। नास्ति सत्यसमं तपः।

CHORUS Urban Health

@ChorusUrban

Working towards ensuring urban health systems are community-led, effective and responsive, particularly to the needs of the poorest. UK Aid funded.

Ajay Banarsidass Gupta

@ag1964

Member National Traders Welfare Board - Govt. of India, A Social worker, Politician and Businessman. Working for Woman Empowerment, Education of poor & needy

iceadobe — a 2.5D thinker @iceadobe

16 days ago

I've tried both Qwen3.6(s)... @SpaceTimeViking 27B and 35B PrismaQuant recipe from @spark_arena. Default configs. I must say the local inference has made tremendous progress. However, DFlash on 27B imho was bad. But MTP on 35B had much higher and consistent results. With...l

iceadobe — a 2.5D thinker @iceadobe

16 days ago

I didn't factor-in while buying DGX Spark that running local AI would cost me more on Data. The Indian ISP's unlimited plans are all just a scam. Go with @airtelindia at least they give you 3.3k GB over @reliancejio's 1k GB per month.

iceadobe — a 2.5D thinker @iceadobe

16 days ago

@mr_r0b0t @NVIDIAAI I'm getting a good 60+ average with the prismaquant 4bit variant of the Qwen 3.6 35b A3B receipe available on @spark_arena. So far amongst various dflash and mtp I have run. This one model has given me the most consistent performance. I had

iceadobe — a 2.5D thinker @iceadobe

18 days ago

I'm currently testing the @NousResearch hermes locally with the DGX Spark (msi); and so far - It is killing it ⭐💖🥺! Model: Qwen 3.6 35B I have high hopes for it!

iceadobe's tweet photo. I'm currently testing the @NousResearch hermes locally with the DGX Spark (msi); and so far - It is killing it ⭐💖🥺!
Model: Qwen 3.6 35B

I have high hopes for it! https://t.co/JiZmUOPseC

112

iceadobe — a 2.5D thinker @iceadobe

20 days ago

I'm on a 30 Mbps plan and the @reliancejio is charging the remaining 14 days of bandwidth for upgrading the plan to 500 Mbps; and even then it will activate after 3 days. What the heck!

iceadobe's tweet photo. I'm on a 30 Mbps plan and the @reliancejio is charging the remaining 14 days of bandwidth for upgrading the plan to 500 Mbps; and even then it will activate after 3 days. What the heck! https://t.co/cGITVqHxCO

iceadobe — a 2.5D thinker @iceadobe

20 days ago

Big boy in town! ⭐ ✨ DGX Spark! Let's see whether it lives upto the hype! 🤑

iceadobe — a 2.5D thinker @iceadobe

21 days ago

We built a world where people work harder than ever, trust less than ever, own less than ever, and somehow we’re all expected to smile through corporate slogans, political theater, algorithmic addiction, and collapsing attention spans like this is peak civilization.

iceadobe — a 2.5D thinker @iceadobe

23 days ago

@sudoingX On many coding/tool-call benchmarks 3.6 27B is shown to be superior or similar to 120B. Even with a REAP of 120b; if A11B doesn't give considerable throughput improvements; 27B might be better. Will do the comparison once I get my GB10 🫢

496

iceadobe — a 2.5D thinker @iceadobe

25 days ago

@Bhavani_00007 M5 Pro. Better chip and thermals. For the LLM that it can fit, you'll get 2-4x prefill and modest tgen boost. Don't even think about Air if you have the budget.

592

iceadobe retweeted

ÆON FORGE ✨ @SpaceTimeViking

28 days ago

Let’s Gooo! 112 Tok/s single stream for my optimized container and model on a single DGX Spark!

iceadobe — a 2.5D thinker @iceadobe

28 days ago

@spark_arena @mr_r0b0t @NVIDIAAI True, but that's only peak tg123 with no ctx. Most people require sustained t/s. Personality, ctx_tg @ d32k or d16k is a sweet-spot for agentic tasks and there decode falls to 50-70 t/s, which is decent. Imo the image gives more realistic numbers for most real world use cases.

iceadobe retweeted

mr-r0b0t

@mr_r0b0t

28 days ago

Made this for everyone who is working with a @NVIDIAAI DGX Spark (GB10) ⚡️ Definitely also bookmark the official site, it's a fabulous resource with playbooks for nearly everything you'd want to see! https://t.co/uAxkSvIbWG

mr_r0b0t's tweet photo. Made this for everyone who is working with a @NVIDIAAI DGX Spark (GB10) ⚡️
Definitely also bookmark the official site, it's a fabulous resource with playbooks for nearly everything you'd want to see!
https://t.co/uAxkSvIbWG https://t.co/H3YS9KTqEQ

218

234

16K

iceadobe retweeted

Thanh Pham

@runsonai

30 days ago

Here's how I went from 23 tok/s to 79 tok/s on my GX10 (DGX Spark) on Qwen3.6-35B-A3B by changing some configs, parameters and firmware upgrades. I scoured nvidia forums and x so you don't have to...

212

383

20K

iceadobe retweeted

Thanh Pham

@runsonai

about 1 month ago

Got Qwen 3.6 35B-A3B MoE running at ~65 tok/s (c=1) and ~121 tok/s (c=4) aggregate on my Asus GX10 (dgx spark). Model stack: • Target: Qwen/Qwen3.6-35B-A3B-FP8 - Drafter: z-lab/Qwen3.6-35B-A3B-DFlash • Spec decode: DFlash, 10 speculative tokens • Context: 200k - KV cache: bf16/auto, not fp8 Used vllm for this (see flags below)

iceadobe retweeted

Sudo su

@sudoingX

about 1 month ago

a week with the dgx spark, here is what is on it and what i have measured so far. nobody is really talking about this machine and it is quietly becoming the workhorse of my whole stack. hardware: nvidia gb10 sm_121, 124 gb unified lpddr5x at 273 gb/s, cuda 13.0 models on disk (305 gb total, 9 ggufs): > qwen 3.6 27b q4_k_m / q5_k_m / q8_0 / ud-q4_k_xl > nemotron 3 omni 30b-a3b q4_k_m / q8_0 / ud-q6_k / ud-q6_k_xl > deepseek v4-flash 158b q4_k_m (112 gb, flagship 128gb-tier test) terminal + shell environment: > zsh + oh-my-zsh + powerlevel10k theme > modern cli stack: bat, eza, ripgrep, fd, git-delta, tldr, neovim, fzf, autojump > 6 tmux sessions actively running for parallel agent work ml + agent stack: > llama.cpp built sm_121 against cuda 13 > uv + venv ml stack with pytorch 2.11.0+cu130 (aarch64) + transformers + diffusers + accelerate > hermes agent v0.11 with codex auth bridge > opencode for free-model overnight research > telegram gateway routing to nemotron q8 right now speeds verified so far: - nemotron 30b-a3b q8: 56 tok/s gen, 1,300 tok/s prefill, 96% gpu, 33gb in unified - qwen 27b dense q4: 40 tok/s consistent 90+ gb of unified memory still free. deepseek v4-flash 158b loading next as the real flagship test, multimodal omni testing once mmproj pulls, comfyui install in flight for the diffusion lane. honestly curious what the actual limit is on this box, i have not hit it yet.

sudoingX's tweet photo. a week with the dgx spark, here is what is on it and what i have measured so far. nobody is really talking about this machine and it is quietly becoming the workhorse of my whole stack.

hardware: nvidia gb10 sm_121, 124 gb unified lpddr5x at 273 gb/s, cuda 13.0

models on disk (305 gb total, 9 ggufs):
> qwen 3.6 27b q4_k_m / q5_k_m / q8_0 / ud-q4_k_xl
> nemotron 3 omni 30b-a3b q4_k_m / q8_0 / ud-q6_k / ud-q6_k_xl
> deepseek v4-flash 158b q4_k_m (112 gb, flagship 128gb-tier test)

terminal + shell environment:
> zsh + oh-my-zsh + powerlevel10k theme
> modern cli stack: bat, eza, ripgrep, fd, git-delta, tldr, neovim, fzf, autojump
> 6 tmux sessions actively running for parallel agent work

ml + agent stack:
> llama.cpp built sm_121 against cuda 13
> uv + venv ml stack with pytorch 2.11.0+cu130 (aarch64) + transformers + diffusers + accelerate
> hermes agent v0.11 with codex auth bridge
> opencode for free-model overnight research
> telegram gateway routing to nemotron q8 right now

speeds verified so far:
- nemotron 30b-a3b q8: 56 tok/s gen, 1,300 tok/s prefill, 96% gpu, 33gb in unified
- qwen 27b dense q4: 40 tok/s consistent

90+ gb of unified memory still free. deepseek v4-flash 158b loading next as the real flagship test, multimodal omni testing once mmproj pulls, comfyui install in flight for the diffusion lane.

honestly curious what the actual limit is on this box, i have not hit it yet.

450

339

66K

iceadobe retweeted

stevibe

@stevibe

about 1 month ago

Everyone's comparing the DGX Spark to a 5090 and calling it slow. I think that's the wrong comparison. I ran Qwen3.6 35B-A3B FP8 with the full 262K context window enabled (~96GB RAM) — something gaming GPUs can't really do. Results: 🟢No context: 51.3 tok/s, TTFT 110ms 🟣200K prefill: 34.6 tok/s, TTFT 85s (~2,341 tok/s prefill) Prefill is way faster than a Mac. And 35 tok/s deep into 200K context, on a model this strong, is genuinely usable. The Spark plays a different game.

246

139

33K

iceadobe — a 2.5D thinker

@iceadobe

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users