cheez @cheeez42 - Twitter Profile

Pinned Tweet

cheez @cheeez42

7 months ago

cheeez42's tweet photo. https://t.co/nDCNEGa2ah

1

10

0

495

cheez @cheeez42

34 minutes ago

Good Morning, My actual work week starts....ill be back this evening to most likely test more gemma 4 12b

0

1

0

13

cheez @cheeez42

34 minutes ago

@Hikari_07_jp I am impressed with it so far. Q5 tested well. I have some results posted on my profile. Going to mess around with Q6 and see if I can still have room for larger context but at even 8k it was almost hittimg my vram cap.

0

1

0

3

cheez @cheeez42

about 9 hours ago

@peach2k2 typical @vaxryy W 🤣🤣

0

1

2K

Who to follow

Marketcap.x | Send.nft | NewZealand.crypto #UDfam #unstoppabledomains

Think Tank Dubs

@ThinkTankDubs

DJ / Selector | Dub x Biscuit jams x Electronica x Hip Hop | Livestream Sets | Über Glue & Spraypaint Victory on JEMP radio | https://t.co/7dqg9MrpfN

cheez @cheeez42

about 12 hours ago

thank you @UnslothAI for the GOAT work yall do!

Unsloth AI

@UnslothAI

about 20 hours ago

@googlegemma Thank you Google Deepmind for constantly releasing open models! 🌟 We made Dynamic GGUFs so you can run Gemma 4 12B more efficiently: https://t.co/8cL321pVDh

21

888

57

332

36K

0

2

0

18

cheez @cheeez42

about 12 hours ago

i dont really like google, but i have been very impressed with their Gemma models. Props to the team for the hard work.

Google Gemma

@googlegemma

about 20 hours ago

Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇

googlegemma's tweet photo. Meet Gemma 4 12B!

A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.

Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇 https://t.co/gf4FZv0WZb

333

11K

2K

4K

2M

0

1

0

12

cheez @cheeez42

about 12 hours ago

@witcheer heres a little secret... *they rip*

0

1

0

27

cheez @cheeez42

about 13 hours ago

Been stress-testing Gemma 4 12B UD-Q5_K_XL on an RTX 5070 12GB. ~60 tok/s generation. Up to ~49k context tested. Under 10GB VRAM. 12GB GPUs are becoming a lot more capable for local AI than most people realize.

cheeez42's tweet photo. Been stress-testing Gemma 4 12B UD-Q5_K_XL on an RTX 5070 12GB.

~60 tok/s generation.
Up to ~49k context tested.
Under 10GB VRAM.
12GB GPUs are becoming a lot more capable for local AI than most people realize. https://t.co/bUPTfFxJsC

0

41

cheez @cheeez42

about 13 hours ago

@stableAPY Never tested context on q6 only because at 8k i thought maybe id run out of vram. but might have to give it another test once im done with q5

0

1

0

2

cheez @cheeez42

about 13 hours ago

@stableAPY everything at 8k. but i just dove back in and started testing larger context windows. just started testing above 32k with a long prompt. ill be sure to post about testings once done. i am impressed so far tho

0

1

0

13

cheez @cheeez42

about 14 hours ago

Tested Gemma 4 12B on an RTX 5070 12GB today. The surprise wasn't the speed. It was how usable the model feels while still fitting comfortably in VRAM. UD-Q5_K_XL is the sweet spot so far. Anyone else running Gemma 4 locally?

0

1

36

cheeez42 retweeted

witcheer

@witcheer

about 24 hours ago

"Expert-pruning shrinks the model, so it must run faster." I tested that and it doesn't. REAP takes Qwen3.6-35B-A3B and prunes 20% of its experts down to 28B. I ran it head-to-head against the unpruned parent. ~~~ what pruning actually changed: - VRAM: 27.3 to 21.6 GB. a real ~6GB win. - speed: 260 to 247 tok/s. no gain - slightly slower, even. - quality: behind on all five (MMLU 94.7 to 87.7, HumanEval 98 to 94, HellaSwag 87 to 82, GSM8K 92 to 90, ARC 97 to 95). ~~~ REAP prunes whole experts, that cuts total params, but the active path per token is still ~3B either way. generation speed tracks active params and memory bandwidth, not total size. you free VRAM, not time. ~~~ so expert-pruning here is a VRAM play, not a speed play. it is worth it if you're memory-bound and need the headroom. not worth it if you're chasing tokens-per-second. (n ~100 per task, so treat each gap as modest - but the parent won every single one.)

witcheer's tweet photo. "Expert-pruning shrinks the model, so it must run faster."

I tested that and it doesn't.

REAP takes Qwen3.6-35B-A3B and prunes 20% of its experts down to 28B.
I ran it head-to-head against the unpruned parent.

~~~
what pruning actually changed:

- VRAM: 27.3 to 21.6 GB. a real ~6GB win.
- speed: 260 to 247 tok/s. no gain - slightly slower, even.
- quality: behind on all five (MMLU 94.7 to 87.7, HumanEval 98 to 94, HellaSwag 87 to 82, GSM8K 92 to 90, ARC 97 to 95).

~~~
REAP prunes whole experts, that cuts total params, but the active path per token is still ~3B either way. generation speed tracks active params and memory bandwidth, not total size.

you free VRAM, not time.

~~~
so expert-pruning here is a VRAM play, not a speed play.
it is worth it if you're memory-bound and need the headroom. not worth it if you're chasing tokens-per-second.

(n ~100 per task, so treat each gap as modest - but the parent won every single one.)

7

33

2

16

3K

cheez @cheeez42

about 23 hours ago

good morning. wonder what kind of fun stuff we gonna get into today. what you working on?

0

7

cheez @cheeez42

1 day ago

Results: Qwen2.5-3B quanted cleanly. Q4_K_M led generation at ~239 tok/s, IQ4_XS was close at ~232 tok/s and smallest at 1.63GiB, Q5_K_M hit ~215 tok/s, Q8_0 was the quality/reference quant at ~158 tok/s. Safety tests showed system prompts matter.

0

52

cheez @cheeez42

1 day ago

Built a local quant lab: HF download → F16 GGUF conversion → Q8/Q6/Q5/Q4/IQ quants → llama.cpp benchmarks → smoke tests with a conservative ops system prompt + safety leakage scanner. First baseline: Qwen2.5-3B on RTX 5070.

1

0

18

cheez @cheeez42

1 day ago

@stableAPY heres to hopin' 🤞

0

1

0

11

cheeez42 retweeted

Ahmad

@TheAhmadOsman

1 day ago

Grifters shipping vibe-coded slop are everywhere now and it is getting exhausting ngl The issue is not that they are vibe coding - Build however you want - Use whatever tools you want - Ship fast, experiment, have fun The issue is pretending the output is serious software when it belongs in the trashbin A lot of these people are not building products - They are producing screenshots - GitHub activity - Fake momentum And naturally, these are the same folks walking around proudly showing their GitHub commit count as if “many tiny commits to a broken app” is a proxy for taste, architecture, reliability, or competence Very happy for them though: The graph is green, the software is not

16

72

4

11

6K

cheez @cheeez42

1 day ago

learning more and more about local llms, how they work and exactly what all the different bits and pieces are really is a lot of fun. its crazy to see where the tech has come in only 5 years. I remember using DeepDaze to make trippy ass pictures back then on my 2060super.

0

8

cheez @cheeez42

1 day ago

@livewithnoregrt i like shouting into the void

0

13

cheez

@cheeez42

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users