Consigliere @seree - Twitter Profile

First time benchmarking rust-based inference engine by atlas (@AtlasInference) Model tested: Qwen3.6 27B FP8 with MTP enabled Things noted: - startup speed, OMG, it's much much faster than other inference engines I've tried! - token generation speed is great but I think it can be faster, let me play with it more and will keep this posted - memory consumed more than vllm, probably the default KV cache is 16bit? - crashed with long context input, yes, it crashed my DGX Spark and force shutdown without a reboot 😭 Anyway, i think the future is quite interesting for this inference engine, will playing more with it!

seree's tweet photo. First time benchmarking rust-based inference engine by atlas (@AtlasInference)

Model tested:
Qwen3.6 27B FP8 with MTP enabled

Things noted:
- startup speed, OMG, it's much much faster than other inference engines I've tried!
- token generation speed is great but I think it can be faster, let me play with it more and will keep this posted
- memory consumed more than vllm, probably the default KV cache is 16bit?
- crashed with long context input, yes, it crashed my DGX Spark and force shutdown without a reboot 😭

Anyway, i think the future is quite interesting for this inference engine, will playing more with it!

2

7

0

2

366

Consigliere

@seree

3 days ago

@AtlasInference I'm adjusting various parameters to experiment it more, will DM what I found very soon. Next station will be 35B A3B NVFP4 with the recent Qwen/Qwen3.6-35B-A3B-NVFP4 model instead od RedHatAI's one. BTW: thanks for your hard work!

0

2

0

69

Consigliere

@seree

3 days ago

@RaminNasibov Syndicate by Bullfrog

0

118

Consigliere

@seree

3 days ago

@ifourth That's why we need a brand instead of us, 3AM Lab will do!

1

2

0

28

Consigliere

@seree

5 days ago

Wait, my DGX Spark was cooked! 🔥🔥🔥

NVIDIA AI

@NVIDIAAI

5 days ago

A new era of PC. 25.0528, 121.5990

373

9K

540

681

2M

0

1

0

63

Consigliere

@seree

6 days ago

@MrPeterLMorris The latest frontier models also can't. That's the nature of LLM.

1

0

10

Consigliere

@seree

6 days ago

@MrPeterLMorris It's not broken, I just ran a few recipes last night. 🔥🔥

1

0

8

Consigliere

@seree

6 days ago

@mr_r0b0t @NVIDIAAI Can’t wait to get the 2nd Spark for this!

0

3

0

222

Consigliere

@seree

7 days ago

It's NOT about not having enough compute. It's about what we're WASTING on the harnesses! We desperately need smarter harness optimization, not just throwing more power at the problem! #AITalks

0

2

0

32

Consigliere

@seree

7 days ago

@MrPeterLMorris To simplify local inferencing on DGX Spark, you gotta try sparkrun. Life Savior!

1

0

1

181

Consigliere

@seree

7 days ago

AEON Ultimate Qwen3.6 27B NVFP4 DFlash - benchmarked! Quite nice speed compared to the base model or even the MTP one. 🔥🔥 #DGXSpark

0

2

0

162

Consigliere

@seree

8 days ago

Benchmarking the famous Qwen3.6 27B FP8 DFlash on my DGX Spark. Speed boosted from 16.2 tps (MTP) to 26.9 tps (DFlash). 🔥🔥 Now testing with OpenCode to see if it will broke at tools calling or not. Will try AEON’s NVFP4 DFlash version very soooooon!

seree's tweet photo. Benchmarking the famous Qwen3.6 27B FP8 DFlash on my DGX Spark.

Speed boosted from 16.2 tps (MTP) to 26.9 tps (DFlash). 🔥🔥

Now testing with OpenCode to see if it will broke at tools calling or not.

Will try AEON’s NVFP4 DFlash version very soooooon! https://t.co/0IWkd4HDaF

1

3

1

0

236

Consigliere

@seree

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users