People ask why I keep insisting on GPUs
and not Mac Studios/Mac minis
for parallel & Agentic Workflows (multi-agents)
This is why:
- Llama 3.1 70B BF16 (~140GB w/o Context)
- on 8x RTX 3090s
- Synthetic data generation with
- 50+ concurrent requests
- Batch inference
- Sustained throughput
Not only that:
> ~2k context per request (prompt)
> ~1.8k tokens in output
> 2 mins 29 secs for 50 responses
This is GPU territory.
You can’t do this on a Mac.
Not yet at least.
The DGX Spark is a ripoff.
$4,699 for 128GB of unified memory and 273 GB/s of bandwidth.
The Mac Studio M4 Max with 128GB of unified memory costs $3,699.
Same memory. $1,000 less. And it gets better.
The Mac Studio has double the bandwidth of the DGX Spark. For generating tokens, that means roughly twice as fast.
The Mac Studio is a full computer with or without AI. The DGX Spark is a $4,699 Linux box that does one thing.
NVIDIA announced this at $3,000. Shipped it at $4,000. Raised it to $4,700.
You're paying a $1,000 NVIDIA tax for half the bandwidth and fewer capabilities.
The only argument for DGX Spark is CUDA. If your entire workflow is locked into TensorRT-LLM and you need FP4 tensor cores, fine.
Everyone else should just buy the Mac Studio.
Jeffrey Epstein is Pedo Geppetto.
The new Epstein Files reveal he met with 4chan founder m00t the day /pol/ was created.
The modern alt-right is just a psyop.