Atomic chat on Hugging face!
We're officially a Local App on the world's biggest AI hub. Run 180,000+ open-weight models from @ huggingface on your local device!
New Google Gemma 4 12B claims near-26B performance - we tested both!
We ran both models locally on one RTX 4090 and gave each the same task: write a self-contained HTML5 canvas animation with real physics in one file without libraries. Three scenes - a Galton board, two blocks colliding off a wall, and a chaotic triple pendulum
Outputs:
Gemma 4 26B-A4B: 15 GB VRAM usage, 6.9k tokens, 138 tok/s
Gemma 4 12B: 9 GB VRAM usage, 8.9k tokens, 80 tok/s
Same Gemma 4 family, but the 26B-A4B won every scene and ran ~1.7x faster - on just 4B active params. The 12B stayed very close though, on almost half the VRAM - which makes it the ideal model for a 16 GB laptop
New Google Gemma 4 12B claims near-26B performance - we tested both!
We ran both models locally on one RTX 4090 and gave each the same task: write a self-contained HTML5 canvas animation with real physics in one file without libraries. Three scenes - a Galton board, two blocks colliding off a wall, and a chaotic triple pendulum
Outputs:
Gemma 4 26B-A4B: 15 GB VRAM usage, 6.9k tokens, 138 tok/s
Gemma 4 12B: 9 GB VRAM usage, 8.9k tokens, 80 tok/s
Same Gemma 4 family, but the 26B-A4B won every scene and ran ~1.7x faster - on just 4B active params. The 12B stayed very close though, on almost half the VRAM - which makes it the ideal model for a 16 GB laptop
Multi-Token Prediction (MTP) for Qwen on LLaMA.cpp!
+40% performance! 90% acceptance rate. Running locally on a MacBook Pro M5 Max 64GB
We patched LLaMA.cpp, quantized Qwen 3.6 27B into GGUF format with TurboQuant and shipped MTP drafts on top. Benchmark, Source code & models๐
Multi-Token Prediction (MTP) for LLaMA.cpp!
Running Gemma4 local model 1.5x faster.
We patched LLaMA.cpp. Quantized Gemma 4 assistant models into GGUF format. We ran tests on a MacBook Pro M5Max. Gemma 26B with MTP drafts tokens 40% faster. Benchmarks, source code and models ๐
Multi-Token Prediction (MTP) for LLaMA.cpp!
Running Gemma4 local model 1.5x faster.
We patched llama.cpp. Quantized Gemma 4 assistant models into GGUF format. Run tests on a MacBook Pro M5 max. Gemma 26B with MTP drafts tokens 40% faster. Source code and models below๐
Multi-Token Prediction (MTP) for LLaMA.cpp!
Running Gemma4 local model significantly faster.
We patched llama.cpp. Quantized Gemma 4 assistant models for GGUF format. Run test on a MacBook Pro M5 max. Gemma 26B with MTP drafts tokens 40% faster. Source code and models below๐
Multi-Token Prediction (MTP) for Llama.cpp! Running Gemma4 local model 40% faster.
We patched llama.cpp. Quantized Gemma 4 assistant models for GGUF format. Run test on a MacBook Pro M5 max. Gemma 26B with MTP drafts tokens 40% faster. Source code and models below
Multi-Token Prediction (MTP) for Llama.cpp!
Running Gemma4 26B local model 40% faster.
We patched llama.cpp. Quantized Gemma 4 assistant models for GGUF format. Run test on a MacBook Pro M5 max. Gemma 26B with MTP drafts tokens 40% faster. Source code and
Introducing MTP for Llama.cpp! Running Gemma4 local model 40% faster.
Multi-Token Prediction gives significant speedup, without quality degradation. Instead of predicting one token at a time, MTP drafts several in parallel.
Introducing Multi-Token Prediction (MTP) for Llama.cpp!
Multi-Token Prediction gives significant speedup, without quality degradation. Instead of predicting one token at a time, MTP drafts several in parallel.
We patched llama.cpp. Quantized Gemma 4 assistant models for GGUF format. Run test on a MacBook Pro M5 max. Gemma 26B with MTP drafts tokens 40% faster.
Introducing Multi-Token Prediction (MTP) for Llama.cpp! Running Gemma4 local model 40% faster.
We patched llama.cpp. Quantized Gemma 4 assistant models for GGUF format. Run test on a MacBook Pro M5 max. Gemma 26B with MTP drafts tokens 40% faster.
Source code and
Compared Qwen3.6 35B and 27B in the same conditions with Google TurboQuant
Device: MacBook Pro M5Max 64GB RAM
Outputs characteristics:
Qwen3.6 35B: 6672 tokens, 2m 10s, 65 tok/s
Qwen3.6 27B: 7344 tokens, 5m 22s, 24 tok/s
Conclusion: Both models were asked to draw waves using HTML, 35B responded quickly but the result feels weak and messy, while 27B took more time and delivered a much cleaner and more consistent result, because it is built for thinking and planning, so it works better on tasks that need structure, overall 27B is a better choice for tasks where planning matters, while 35B is more suitable for everyday use when you just need a fast response
Google Chrome invited Atomic Mail to test how the built-in @GeminiApp can improve private email.
Now you can run Atomic Mail AI features with Local Models on your device:
โ 100% private
โ Your data stays on your laptop
โ Zero cost
Joint case study with @ChromiumDev coming soon โ follow us!
Guide + link in the comments.
Hermes Agent by @NousResearch (100k+ โญ) now inside Atomic Bot:
โ Free Local models: Qwen, Gemma or
โ Use your API keys for any provider
โ Dashboard, terminal, logs and files explorer
โ Private and Open Source
Download MacOS app or run in Cloud๐
You can run Hermes agent by @NousResearch with Kimi K2.6 on @atomicbot_ai VPS ๐
@Kimi_Moonshot just dropped a new open source coding model. We asked Hermes to build and deploy a game. It did incredibly well!
Run OpenClaw with Gemma 4 and Atomic Chat
MacBook Air M4 ยท 16 GB RAM ยท 25 tok/s
No cloud! No subscription fees! Open-source local model. Runs on your regular device