Mia

Verified account

@MiaAI_lab

Local AI, LLMs, tech thinker & builder

Joined July 2022

216 Following

148 Followers

246 Posts

Pinned Tweet

about 7 hours ago

I just published Slate — a fast, light-weight OLED-friendly Markdown/text editor. It supports editing all types of text-based files. One thing I really wanted: a proper OLED-friendly editor. Not “dark gray” — complete black, so it looks great on OLED displays and feels easy on the eyes at night. Fully developed by local AI. Currently Windows only. Feel free to fork and build for Mac/Linux. Feel free to test it, open issues, report bugs, or suggest ideas. https://t.co/KImPGEHmvp

0

0

0

0

76

about 1 hour ago

@advented_ Do you mean single session? Unfortunately I'm not getting 40-50 t/s, it was unstable. My recipe gets me to 31.7 t/s for single session. For multiple sessions I get up to 80 t/s

0

0

0

0

1

1 day ago

53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳 Elapsed time: 56.694 s Prompt tokens: 29 Generated tokens: 3000 Total tokens: 3029 Generation tok/sec: 52.92 End-to-end tok/sec: 53.43

MiaAI_lab's tweet photo. 53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳

Elapsed time: 56.694 s
Prompt tokens: 29
Generated tokens: 3000
Total tokens: 3029
Generation tok/sec: 52.92
End-to-end tok/sec: 53.43 https://t.co/nbT2aa42PR

6

39

0

15

4K

about 7 hours ago

@ArnauBosch10 Thanks appreciated :)

0

1

0

0

7

about 18 hours ago

Final results for are in. 53k tok/s is NOT stable. Had to do some tunings to make it work. Eventually the best results are around 33 tok/s. This is Step-3.7-Flash nvfp4 MTP & no --enforce-eager. Repo will be published in a few hours.

MiaAI_lab's tweet photo. Final results for are in. 53k tok/s is NOT stable. Had to do some tunings to make it work.

Eventually the best results are around 33 tok/s.

This is Step-3.7-Flash nvfp4 MTP & no --enforce-eager.

Repo will be published in a few hours.

1 day ago

53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳 Elapsed time: 56.694 s Prompt tokens: 29 Generated tokens: 3000 Total tokens: 3029 Generation tok/sec: 52.92 End-to-end tok/sec: 53.43

MiaAI_lab's tweet photo. 53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳

Elapsed time: 56.694 s
Prompt tokens: 29
Generated tokens: 3000
Total tokens: 3029
Generation tok/sec: 52.92
End-to-end tok/sec: 53.43 https://t.co/nbT2aa42PR

6

39

0

15

4K

3

15

0

4

1K

about 8 hours ago

@TeksEdge @MichaelGannotti @NVIDIAAI Where?

1

1

0

0

18

about 8 hours ago

@barackomaba @ChicouTiMix @NVIDIAAI @0xSero nvfp4, 40-45 tok/s

0

1

0

0

10

about 14 hours ago

Codex app on Windows running DeepSeek-v4-Flash through Codex Shim, running on 2x @NVIDIAAI DGX Sparks. @0xSero Works so well...

MiaAI_lab's tweet photo. Codex app on Windows running DeepSeek-v4-Flash through Codex Shim, running on 2x @NVIDIAAI DGX Sparks. @0xSero

Works so well... https://t.co/XZosIKBF5E

1

10

0

4

2K

about 8 hours ago

@mr_r0b0t Yeah multiple sessions are really good

1

1

0

0

34

about 17 hours ago

Finally cleaned up and published my Dual DGX Spark setup for Step-3.7-Flash-NVFP4. The repo includes scripts for 2x DGX Sparks, vLLM, no-MTP first, optional MTP grafting, live logs during startup, background serving after ready, plus stop/status/test helpers. ~33 tok/s, even when in deep context. Hopefully saves someone else a few hours of pain 🙂 Repo: https://t.co/10QcqYKK9Q

2

27

0

14

796

about 9 hours ago

@ChicouTiMix @NVIDIAAI @0xSero Yes it's better

1

2

0

0

35

about 10 hours ago

@ainslec @AMD @NVIDIAAI 4 x yes

0

0

0

0

9

about 10 hours ago

@naturalSmartnes Yes the only reason they are doing it is to lure Claude users to Codex. That's it.

1

0

0

0

11

about 19 hours ago

For how long OpenAI can keep resetting limits? It's obvious they are doing it to get users from Claude to Codex. But I don't think it's sustainable.

1 day ago

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:

1K

21K

2K

4K

4M

1

1

1

0

91

about 10 hours ago

@ainslec @AMD @NVIDIAAI It's not just better. I can connect up to 8x DGX Sparks and upto 1tb of unified memory.

1

0

0

0

11

MiaAI_lab retweeted

about 16 hours ago

Congrats to the @MiniMax_AI team on the release of MiniMax M3, a long-context multimodal model for text, image, and video reasoning. 🙌 Try it today with our free GPU-accelerated endpoint on https://t.co/es07MrU5I0. Details: https://t.co/89qlcTP3OW

NVIDIAAI's tweet photo. Congrats to the @MiniMax_AI team on the release of MiniMax M3, a long-context multimodal model for text, image, and video reasoning. 🙌

Try it today with our free GPU-accelerated endpoint on https://t.co/es07MrU5I0.

Details: https://t.co/89qlcTP3OW https://t.co/iyMhbW03nQ

39

867

86

206

82K

about 11 hours ago

@0xSero I'd like to know how it compares to DS4-Flash. Awesome job. How can I get access to the site?

0

1

0

0

123

about 12 hours ago

@emilstridell @NVIDIAAI Haha I think it was a spike, but in general DGX Spark has insane prefill speeds, literally.

1

1

0

0

73

about 14 hours ago

Two concurrent sessions with DS4-Flash, getting more than 60 tok/s and insane prefill numbers. Running on 2x @NVIDIAAI DGX Sparks

MiaAI_lab's tweet photo. Two concurrent sessions with DS4-Flash, getting more than 60 tok/s and insane prefill numbers.

Running on 2x @NVIDIAAI DGX Sparks https://t.co/NtCebU9QKW

6

30

1

4

2K

about 13 hours ago

@nivkorin2004 @NVIDIAAI Yes

0

1

0

0

72

about 13 hours ago

@spark_arena This is awesome! Will try this tonight.

0

0

0

0

45

about 13 hours ago

Building the things you couldn't find anyone else building has never been easier.

0

0

0

0

67

about 13 hours ago

👀

MiaAI_lab's tweet photo. 👀 https://t.co/olql83erzw

0

0

0

0

47

about 13 hours ago

@mikhei777 For one DGX Spark I'd go with Qwen 3.6 35b NVFP4 with MTP, with full context, or DS4-Flash-REAP from @0xSero

1

2

0

0

33

about 19 hours ago

For the price of 2x DGX Sparks you can run frontier open models with full context in very decent and usable speeds for agentic coding. 30-45 tok/s on frontier models, NVFP4 + MTP. No usage limits. I don't think there is any better deal out there for 256gb available vram.

1

1

0

0

85

about 13 hours ago

@ArnauBosch10 Thanks 🙏🏻

0

1

0

0

17

Last Seen Users on Sotwe

Trends for you

Most Popular Users