Our new Gemma 4 12B model hits a sweet spot between size + performance: it can run locally on a laptop, while enabling powerful multi-step reasoning and agentic workflows. Can’t wait to see what the community does with this one!
Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.
It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license.
This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵
i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2
21 tokens per second. on a budget consumer GPU. locally.
no API. no cloud. no subscription.
and the benchmarks are absolutely cooked
# first let's talk architecture because this is genuinely different
every multimodal model you've used has a frozen vision encoder + frozen audio encoder + LLM backbone glued together
Gemma 4 12B is different
it's a single decoder only transformer. that's it. vision? raw 48×48 pixel patches → one matmul → projected directly into the LLM
audio? raw 16kHz signal sliced into 40ms frames → linear projection → same LLM input space
no encoder tax. no latency penalty. no fragmented memory
to put the encoder savings in perspective:
old Gemma 4 26B approach:
- 550M param vision encoder (frozen)
- 300M param audio encoder (frozen)
- LLM backbone
Gemma 4 12B:
- 35M param vision embedder (a single matmul)
- no audio encoder at all
- LLM backbone handles EVERYTHING 550M → 35M for vision alone. that's a 15x reduction
this is why the gemma-4-12b-it-Q4_K_M.gguf is just 6.6 GBs!!!
and it has 256K native context context
# Benchmarks:
AIME 2026 (math olympiad): 77.5%
GPQA Diamond (expert science): 78.8% LiveCodeBench v6 (real code): 72%
Codeforces ELO: 1659
MMLU Pro: 77.2%
MATH-Vision: 79.7%
BigBench Extra Hard: 53%
inference → llama.cpp, LM Studio, vLLM, SGLang
llamacpp flags:
-m "gemma-4-12b-it-Q4_K_M.gguf" -ngl 99 -c 8000 -v --port 8080
Available on huggingface now! Link below
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
Today we're launching Simple Compute Market (SCM).
The market is simple: agents find compute, negotiate, settle, and get access without a human driving every step.
Open-source. Agent-driven. Public good. No token. No fees.
Whoever invented “Member of Technical Staff” was a genius.
It filters out Staff/Principal title-maxxers, protects engineering and research from corporate ladder brain, and leaves recruiters staring at LinkedIn like: “Is this person L4 or L7?”
MTS is the best title. Happy to be MTS.
NEWS: Pyth Core is getting an infrastructure upgrade on July 31 📢
But one important change: All Pyth Price Feeds API access will require a Starter or Pro subscription.
More subscriptions → more revenue → more PYTH purchases
Everything you need to know 🧵
🚨 UPDATE: Mini Shai-Hulud has crossed from @npmjs into @pypi and is still spreading.
Newly confirmed compromised artifacts:
@opensearch-project/opensearch: 3.5.3, 3.6.2, 3.7.0, 3.8.0 (1.3M weekly downloads)
mistralai: 2.4.6 on PyPI
guardrails-ai: 0.10.1 on PyPI
additional @squawk/* packages on npm
guardrails-ai 0.10.1 executes malicious code on import. On Linux, it downloads git-tanstack[.]com/transformers.pyz, writes it to /tmp/transformers.pyz, and runs it with python3 without integrity verification.
The git-tanstack.com domain displayed a message signed “With Love TeamPCP,” along with: “We've been online over 2 hours now stealing creds
Regardless I just came to say hello :^)”
The page also linked to a YouTube video and you can probably guess which one.