Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.
It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license.
This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵
When the creator of Redis starts thinking about KV cache, pay attention.
antirez is Salvatore Sanfilippo, the Sicilian programmer best known for creating Redis.
But “creator of Redis” is almost too small a label.
Before Redis, he was already an old-school systems hacker. He built hping, worked in network security, and invented the idle scan technique. This was the packet-level, C-programming, Unix-hacker world.
Then Redis happened.
The origin was not glamorous. He was building LLOOGG, a real-time web analytics service, and needed something faster and simpler than the tools he had. So he created Redis.
That is very antirez.
Start with a real bottleneck.
Avoid unnecessary abstraction.
Expose the right primitive.
Make it fast enough that people rethink the category.
Redis did not win because it looked like a traditional database. It won because it gave developers direct access to useful data structures: strings, lists, hashes, sets, sorted sets, streams, pub/sub.
It made memory programmable.
That is why his return to local AI is so interesting.
With ds4, or DwarfStar 4, antirez is not just building “another local inference engine.”
He is asking a very Redis-like question:
What is the real primitive here?
For LLMs, one answer is obvious: KV cache.
Most people treat KV cache as an implementation detail. It lives in RAM or HBM, grows with context, and quietly becomes the bottleneck.
antirez looks at DeepSeek V4 Flash, compressed KV cache, modern MacBook SSDs, and says: maybe KV cache should not only live in RAM.
His phrase is perfect:
“The KV cache is actually a first-class disk citizen.”
That one sentence is the whole story.
If Redis made in-memory data structures feel like application infrastructure, ds4 is exploring whether local LLM state can become durable infrastructure too.
Prefill once.
Persist the cache.
Resume later.
Let long-running agents reuse expensive context instead of rebuilding everything from scratch.
This matters because coding agents are not normal chatbots.
They carry huge system prompts, tool definitions, repo context, prior steps, and long task histories. If every request has to resend and recompute the entire conversation, local inference will always feel fragile and wasteful.
ds4 attacks that directly.
It is a deliberately narrow engine for DeepSeek V4 Flash, focused on Metal and CUDA, high-end personal machines, special quantization, long context, HTTP API, GGUF files crafted for the engine, official-logit validation, and agent integration.
There is also a funny and very current detail: he openly says ds4 was built with strong assistance from GPT 5.5, with humans leading ideas, testing, and debugging.
That is very 2026.
A legendary C programmer using an AI coding partner to build a local AI engine, so other coding agents can run locally with persistent KV state.
It sounds recursive because it is.
And he still has the same builder energy. After ds4 took off, he wrote that the first week felt like early Redis again, with 14-hour workdays, chaos, and excitement.
That is the part I like most: a true old-school builder.
@DjokerNole@rolandgarros Absolute legend 🐐. IWe all hope to see you at your best in Wimbledon. You deserve so much the 25th SLAM. You can make it. IDEMO Nole💪
Claude Opus 4.8 is out today. It's our strongest coding model yet: up on SWE-bench Pro (from 64.3 to 69.2) and noticeably more honest about its own work. It tells you when it's unsure and catches its own bugs instead of declaring victory early. Same price as 4.7.
@GrandjeanJ27374@OlivierRoland Sì sì j’ai bien compris,..Je vous remercie pour votre remarque courtoise et constructive 😉. Ce sont de vraies questions qui ne sont pas adressées dans la vidéo il me semble.
We just launched the ability to build native Android apps directly in Google AI Studio for free!
Since launch last week, people have created more than 250,000 Android apps. Likely >99% of these folks never built an Android app before, everyone can now build, no coding required!
@namcios Let’s see if they keep up their resolution..Click up is clearly a SaaS that is made quite obsolete by AI and is on the top list of the SaaS we will stop using this year…