Espen JD

Verified account

@Snixtp

Cyber Network Engineer | Codex enthusiast | Local AI | RTX Pro 6000 enjoyer

Joined June 2020

723 Following

655 Followers

4.3K Posts

Pinned Tweet

about 16 hours ago

I've uploaded Q4, Q8 and BF16 MTP draft models in their own repo It should work natively with yesterday's llama.cpp merge https://t.co/Rmbz0b0EVq

2

43

2

22

3K

about 5 hours ago

Q6 MTP is also up for those who want a good balance between size and quality https://t.co/Rmbz0b0EVq

about 16 hours ago

I've uploaded Q4, Q8 and BF16 MTP draft models in their own repo It should work natively with yesterday's llama.cpp merge https://t.co/Rmbz0b0EVq

2

43

2

22

3K

0

7

0

2

538

about 6 hours ago

@RedBedDread @DJLougen likely because better precision is the safer quality/acceptance default

0

1

0

0

27

about 16 hours ago

I've uploaded Q4, Q8 and BF16 MTP draft models in their own repo It should work natively with yesterday's llama.cpp merge https://t.co/Rmbz0b0EVq

2

43

2

22

3K

Who to follow

Veepor Actaroos

i like platformer games (19) pfp by @boatbirb banner by @Rayiott �� 𓀱 𓀲 𓀳 𓀴 𓀵 𓀶 ௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌௌ

about 7 hours ago

New Gemma 4 model 12B, which means it can run on 16GB cards

about 7 hours ago

Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇

googlegemma's tweet photo. Meet Gemma 4 12B!

A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.

Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇 https://t.co/gf4FZv0WZb

235

8K

1K

3K

1M

1

4

0

0

570

about 9 hours ago

@TechMDAI Soon I hope lol

0

1

0

0

107

about 13 hours ago

Mmyes, but misleading. You’re not using PCIe as VRAM replacement every token. With model splitting, each 3090 mostly reads its own local VRAM, and PCIe is mostly moving activations/tensors between GPUs, its not streaming the whole model every token. That’s why you don’t see much of a performance difference between x8 vs x16, even tho it’s half the bandwidth.

0

0

0

0

24

1 day ago

Another 3090 ✅ 184GB VRAM

Snixtp's tweet photo. Another 3090 ✅

184GB VRAM https://t.co/SqiLWWcvGK

15

123

2

2

7K

about 15 hours ago

@Yeghro Threadripper 7960x and Gigabyte TRX50 Aero D

0

1

0

0

22

about 16 hours ago

Full model Q4_K_M Step 3.7 Flash: https://t.co/I2nRB7YdOK

0

2

0

0

283

about 16 hours ago

@thsottiaux do not sunset the Codex name

0

4

0

0

92

about 17 hours ago

@uzairakrum Real Gs know 5.2 is the goat

0

0

0

0

114

about 17 hours ago

@TheAhmadOsman Please elaborate on fp8 kv cache on ampere

0

2

0

0

162

about 18 hours ago

@Liam06972452 @jasondeanlee API only sadly

0

1

0

0

353

about 18 hours ago

@bsilone I’ve gotten DeepSeek V4 Flash to run with 128K context. ~165GB model alone. Struggling with speed tho because of ds4 engine pop

0

1

0

0

33

about 18 hours ago

@needmorevram :D

0

0

0

0

6

3 days ago

I've uploaded my Q4_K_M Step 3.7 Flash GGUF MTP quant on Huggingface Let me know how it is please https://t.co/Y5xpvQa70C

10

64

3

51

8K

about 18 hours ago

@taleosmith I can definitely post more :)

0

0

0

0

27

about 18 hours ago

@faultbugs 3x 3090 Pro 6000 4080s I’m trying to get q4 DeepSeek v4 flash to run

0

1

0

0

30

1 day ago

@JakeSamlich llms

1

0

0

0

92

1 day ago

@mr_r0b0t thank you sir

0

1

0

0

132

1 day ago

@alpinesnow23 230V EU

0

1

0

0

41

1 day ago

@alpinesnow23 1600W and 750W I’m replacing the 750W with a 1000W I got laying around, I’ve just been lazy and not done it yet lol

1

1

0

0

164

Last Seen Users on Sotwe

Trends for you

Most Popular Users