Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
Mmyes, but misleading.
You’re not using PCIe as VRAM replacement every token. With model splitting, each 3090 mostly reads its own local VRAM, and PCIe is mostly moving activations/tensors between GPUs, its not streaming the whole model every token.
That’s why you don’t see much of a performance difference between x8 vs x16, even tho it’s half the bandwidth.