ML @themachinescrew - Twitter Profile

People ask why I keep insisting on GPUs and not Mac Studios/Mac minis for parallel & Agentic Workflows (multi-agents) This is why: - Llama 3.1 70B BF16 (~140GB w/o Context) - on 8x RTX 3090s - Synthetic data generation with - 50+ concurrent requests - Batch inference - Sustained throughput Not only that: > ~2k context per request (prompt) > ~1.8k tokens in output > 2 mins 29 secs for 50 responses This is GPU territory. You can’t do this on a Mac. Not yet at least.

TheAhmadOsman's tweet photo. People ask why I keep insisting on GPUs
and not Mac Studios/Mac minis
for parallel & Agentic Workflows (multi-agents)

This is why:
- Llama 3.1 70B BF16 (~140GB w/o Context)
- on 8x RTX 3090s
- Synthetic data generation with
- 50+ concurrent requests
- Batch inference
- Sustained throughput

Not only that:
> ~2k context per request (prompt)
> ~1.8k tokens in output
> 2 mins 29 secs for 50 responses

This is GPU territory.
You can’t do this on a Mac.
Not yet at least.

48

460

21

275

37K

themachinescrew retweeted

David Hendrickson

@TeksEdge

2 months ago

🔥 RTX 5090 + Gemma 4 31B: Real user testing right now 💳️ 32GB GDDR7 gives excellent headroom for higher quants on this dense 31B model. 🧪 Typical performance (llama.cpp + early user reports): QuantApprox. VRAM (weights + overhead)Expected TPS (generation) ⚡ Q4_K_M ~18–21GB 55–75+ t/s 📈 Q5_K_XL ~22–25GB 45–65 t/s 🐢 Q6_K / Q8 ~26–32+GB 35–55 t/s Users are actively testing 🐌 Unsloth UD-Q5_K_XL on RTX 5090 and tuning with TurboQuant / KV cache compression for better speed. Great quality + performance balance for local Gemma 4 31B inference 👌 Who else is running it? 👀

TeksEdge's tweet photo. 🔥 RTX 5090 + Gemma 4 31B: Real user testing right now

💳️ 32GB GDDR7 gives excellent headroom for higher quants on this dense 31B model.

🧪 Typical performance (llama.cpp + early user reports):

QuantApprox. VRAM (weights + overhead)Expected TPS (generation)
⚡ Q4_K_M ~18–21GB 55–75+ t/s
📈 Q5_K_XL ~22–25GB 45–65 t/s
🐢 Q6_K / Q8 ~26–32+GB 35–55 t/s

Users are actively testing 🐌 Unsloth UD-Q5_K_XL on RTX 5090 and tuning with TurboQuant / KV cache compression for better speed.

Great quality + performance balance for local Gemma 4 31B inference 👌

Who else is running it? 👀

19

215

19

112

40K

themachinescrew retweeted

Ali Romman

@aliromman_

2 months ago

The DGX Spark is a ripoff. $4,699 for 128GB of unified memory and 273 GB/s of bandwidth. The Mac Studio M4 Max with 128GB of unified memory costs $3,699. Same memory. $1,000 less. And it gets better. The Mac Studio has double the bandwidth of the DGX Spark. For generating tokens, that means roughly twice as fast. The Mac Studio is a full computer with or without AI. The DGX Spark is a $4,699 Linux box that does one thing. NVIDIA announced this at $3,000. Shipped it at $4,000. Raised it to $4,700. You're paying a $1,000 NVIDIA tax for half the bandwidth and fewer capabilities. The only argument for DGX Spark is CUDA. If your entire workflow is locked into TensorRT-LLM and you need FP4 tensor cores, fine. Everyone else should just buy the Mac Studio.

aliromman_'s tweet photo. The DGX Spark is a ripoff.

$4,699 for 128GB of unified memory and 273 GB/s of bandwidth.

The Mac Studio M4 Max with 128GB of unified memory costs $3,699.

Same memory. $1,000 less. And it gets better.

The Mac Studio has double the bandwidth of the DGX Spark. For generating tokens, that means roughly twice as fast.

The Mac Studio is a full computer with or without AI. The DGX Spark is a $4,699 Linux box that does one thing.

NVIDIA announced this at $3,000. Shipped it at $4,000. Raised it to $4,700.

You're paying a $1,000 NVIDIA tax for half the bandwidth and fewer capabilities.

The only argument for DGX Spark is CUDA. If your entire workflow is locked into TensorRT-LLM and you need FP4 tensor cores, fine.

Everyone else should just buy the Mac Studio.

111

430

25

160

88K

ML @themachinescrew

3 months ago

@TurnerNovak @TrungTPhan Just give it a snowblower

0

1

0

6

ML @themachinescrew

4 months ago

@nic_carter Shit take mate. We ain’t China.

0

2

ML @themachinescrew

4 months ago

@ErinnFL @MarioNawfal Bingo

0

1

0

59

ML @themachinescrew

4 months ago

@caprioleio Pffff no because its a shitcoin

0

50

ML @themachinescrew

4 months ago

@LarkDavis What am I forced to do lawk?

0

1

0

25

ML @themachinescrew

4 months ago

@RepJeffries You sir, are a patsy.

0

4

ML @themachinescrew

4 months ago

@TaikiMaeda2 Ya got me

1

0

33

ML @themachinescrew

4 months ago

@TrendingBitcoin It would be funny if he was just hitting a giant dab rig instead of the pen

0

3

ML @themachinescrew

4 months ago

@jnicolem Fake

0

95

themachinescrew retweeted

gino.eth 💽

@GinoTheGhost

4 months ago

Jeffrey Epstein is Pedo Geppetto. The new Epstein Files reveal he met with 4chan founder m00t the day /pol/ was created. The modern alt-right is just a psyop.

144

5K

1K

124K

ML @themachinescrew

4 months ago

@piovincenzo_ Poo poo pee pee listen to me

0

4

ML @themachinescrew

4 months ago

@Saboo_Shubham_ You’re telling me agents will train and get better 🤯??

0

12

ML @themachinescrew

4 months ago

@ripeth Ok but sir - the price.

0

1

0

10

ML

@themachinescrew

Last Seen Users on Sotwe

Trends for you

Most Popular Users