Michele Sciabarra

@msciab

Founder and CPO @ - O'Reilly Author - PMC Apache OpenServerless and OpenWhisk - Trustable Developer, focusing on Local Serverless for Private AI

London, England

Joined December 2016

559 Following

510 Followers

382 Posts

Michele Sciabarra

@msciab

5 days ago

Why haven't we lost the battle for data yet, despite the cloud? And why is private AI our only defense against digital twins? Many, too many people do not worry about uploading their documents to ChatGPT because of an argument that is unfortunately not entirely correct, and that I used to make myself: they already have our data. They have our emails and our documents, since we keep them in the cloud. So what defense do we have? This is true... up to a point. Apart from the legal aspects, which can be bypassed or simply violated, what few people understand is the difference between raw data and contextualized data. This becomes obvious to anyone who tries to build a RAG system. Giving an AI gigabytes and gigabytes of documents and emails does not help much, simply because the AI has no way to distinguish or focus on what is important and what is irrelevant. A lot of work is needed to contextualize the data. The power of AI lies in its attention mechanisms, and attention is driven by context. In a discussion with AI, we are the ones who provide that context. AI works well in conversations because much of the work is still done by the human-in-the-loop: you provide the context, the AI finds the relevant information, you read what it writes and focus it even more. That is what produces the surprising results. When we give a document to an AI inside a conversation, we are not just providing information: we are providing information that has already been oriented by context. Our questions indicate what matters, which parts of the document are relevant, and which are not. So, even if they already have the data, AI companies are hungry for your conversations, because those are what really help AI learn. And that is what we need to defend. Otherwise, we will teach AI to replicate us and create "digital twins". That is what many companies want to do: keep in-house the relevant knowledge that lives in people's heads while they work for us. But if we give all this information to a public AI, we are building the factory of our competitors. It sounds like a joke, but imagine a small agency, Web Design Castelfranco, with an established local client base. One day, Design Web Castelfranco appears: a similar website, the same niche, emails sent to the same clients, offering a website update at half the price. Everything seems normal, except that Design Web Castelfranco is a digital clone run entirely by AI agents. And it is already happening.

msciab's tweet photo. Why haven't we lost the battle for data yet, despite the cloud? And why is private AI our only defense against digital twins?

Many, too many people do not worry about uploading their documents to ChatGPT because of an argument that is unfortunately not entirely correct, and that I used to make myself: they already have our data. They have our emails and our documents, since we keep them in the cloud. So what defense do we have?

This is true... up to a point. Apart from the legal aspects, which can be bypassed or simply violated, what few people understand is the difference between raw data and contextualized data.

This becomes obvious to anyone who tries to build a RAG system. Giving an AI gigabytes and gigabytes of documents and emails does not help much, simply because the AI has no way to distinguish or focus on what is important and what is irrelevant. A lot of work is needed to contextualize the data.

The power of AI lies in its attention mechanisms, and attention is driven by context. In a discussion with AI, we are the ones who provide that context. AI works well in conversations because much of the work is still done by the human-in-the-loop: you provide the context, the AI finds the relevant information, you read what it writes and focus it even more. That is what produces the surprising results.

When we give a document to an AI inside a conversation, we are not just providing information: we are providing information that has already been oriented by context. Our questions indicate what matters, which parts of the document are relevant, and which are not.

So, even if they already have the data, AI companies are hungry for your conversations, because those are what really help AI learn. And that is what we need to defend.

Otherwise, we will teach AI to replicate us and create "digital twins". That is what many companies want to do: keep in-house the relevant knowledge that lives in people's heads while they work for us.

But if we give all this information to a public AI, we are building the factory of our competitors.

It sounds like a joke, but imagine a small agency, Web Design Castelfranco, with an established local client base. One day, Design Web Castelfranco appears: a similar website, the same niche, emails sent to the same clients, offering a website update at half the price.

Everything seems normal, except that Design Web Castelfranco is a digital clone run entirely by AI agents.

And it is already happening.

Michele Sciabarra

@msciab

6 days ago

Every time people talk about AI and closed models (ChatGPT, Claude, and Gemini), describing them as "unreachable" compared with open models dismissed as "toys", I ask: fine, but how much weight does Linux carry in the market today? Apart from desktop PCs, which are driven more by habit and where Windows and Mac still dominate, Linux is everywhere: internet servers, data centers, supercomputers, smartphones, and appliances. At the beginning, it was not like that. Sun Microsystems dominated the high end, while Windows NT dominated the low end. Linux was seen as a toy; people argued that Solaris and Windows NT were technically superior. The argument was the same one used today for closed AI models: those models are much better than anything open source can produce, because that is where the money is. But the point is not to have a system that does everything and the opposite of everything. The point is to get a system that does what is needed. Today, the business around AI is microscopic. We are talking about 1.8 billion in a global market of 5.4 trillion: about 0.03%. In this context, and with the few real investments being made, people chase the "coolest" model. But when it is time to move from words to facts, and AI becomes a strategic asset, and you have to choose, what will you choose? The super mega mythical model that does fabulous things but costs a lot of money, locks you into one provider, requires you to hand over all your data, and never really tells you what they do with it... or do you "settle" for a model that costs much less and does what it needs to do? Besides, the idea that open models can do little is absolutely false. There are very powerful 120B open models, like Qwen3, and open models at the level of GPT-5, like DeepSeek4. But because you have never seen or tried them, you believe the prevailing narrative, also fueled by creators who live on clicks and always have to amaze people with special effects. If an open model today does what a closed model did six months ago, it does not make news. Yet that is the revolution. We have chosen the against-the-current path of private AI and open models. It is consistent with the choice I made many years ago: I wrote the book Linux and Web Programming when almost nobody knew what web programming even was. And it became a bestseller. Curiously, Nuvolaris is doing fairly well as a business, and it works only with the "toys": private models. Who knows why.

msciab's tweet photo. Every time people talk about AI and closed models (ChatGPT, Claude, and Gemini), describing them as "unreachable" compared with open models dismissed as "toys", I ask: fine, but how much weight does Linux carry in the market today?

Apart from desktop PCs, which are driven more by habit and where Windows and Mac still dominate, Linux is everywhere: internet servers, data centers, supercomputers, smartphones, and appliances.

At the beginning, it was not like that. Sun Microsystems dominated the high end, while Windows NT dominated the low end. Linux was seen as a toy; people argued that Solaris and Windows NT were technically superior.

The argument was the same one used today for closed AI models: those models are much better than anything open source can produce, because that is where the money is.

But the point is not to have a system that does everything and the opposite of everything. The point is to get a system that does what is needed.

Today, the business around AI is microscopic. We are talking about 1.8 billion in a global market of 5.4 trillion: about 0.03%. In this context, and with the few real investments being made, people chase the "coolest" model.

But when it is time to move from words to facts, and AI becomes a strategic asset, and you have to choose, what will you choose?

The super mega mythical model that does fabulous things but costs a lot of money, locks you into one provider, requires you to hand over all your data, and never really tells you what they do with it... or do you "settle" for a model that costs much less and does what it needs to do?

Besides, the idea that open models can do little is absolutely false. There are very powerful 120B open models, like Qwen3, and open models at the level of GPT-5, like DeepSeek4.

But because you have never seen or tried them, you believe the prevailing narrative, also fueled by creators who live on clicks and always have to amaze people with special effects. If an open model today does what a closed model did six months ago, it does not make news. Yet that is the revolution.

We have chosen the against-the-current path of private AI and open models. It is consistent with the choice I made many years ago: I wrote the book Linux and Web Programming when almost nobody knew what web programming even was. And it became a bestseller.

Curiously, Nuvolaris is doing fairly well as a business, and it works only with the "toys": private models. Who knows why.

101

Michele Sciabarra

@msciab

7 days ago

Today it all looks easy: put the AI workloads in the cloud, get to market immediately, pay little at the beginning, and even mock the people investing in infrastructure, private models, and internal competence. Then two years pass. The bills grow. Lock-in becomes a chain. Government and compliance questions start arriving. Competitors copy what you built. Meanwhile, the person who looked slow has learned how to produce. They have their own machines, their own models, their data under control, and costs that go down with dedicated chips and automation. The point is not to avoid the cloud. The point is not to mistake initial speed for a structural advantage.

msciab's tweet photo. Today it all looks easy: put the AI workloads in the cloud, get to market immediately, pay little at the beginning, and even mock the people investing in infrastructure, private models, and internal competence.

Then two years pass.

The bills grow.

Lock-in becomes a chain.

Government and compliance questions start arriving.
Competitors copy what you built.

Meanwhile, the person who looked slow has learned how to produce. They have their own machines, their own models, their data under control, and costs that go down with dedicated chips and automation.

The point is not to avoid the cloud.

The point is not to mistake initial speed for a structural advantage.

Michele Sciabarra

@msciab

8 days ago

Not documented but worth to know: how to download a specific version of Ollama? This is the trick: curl -fsSL https://t.co/ywh9H5UvxK |\ env OLLAMA_VERSION=<version> bash

Who to follow

Alvin Alexander

@alvinalexander

Over 260 free Scala & Functional Programming videos: https://t.co/pWogmdssB1

Gregor

@ghohpe

Penthouse Architect in the engine room💭Tree-hugging car nut💭Confused Sensemaker💭 Cloud immigrant. Opinions mine and plentiful.

Alan Henderson

@44AlanHenderson

Family man, entrepreneur, Hoosier and retired NBA player, growing Henderson Spirits Group

Michele Sciabarra

@msciab

8 days ago

If you have a large document base, many people think of RAG (retrieval-augmented generation), meaning "teaching" AI your documents and asking questions about them. In reality, it is a fairly outdated concept, for several reasons. The concept behind RAG is to extract information using a "vector" search, that is, a semantic search, over relevant "fragments" of knowledge, and use them to obtain answers. The fact is that the foundation of the concept, namely "semantic" search, rarely produces good results, because extracting snippets of sentences on the assumption that they might be relevant is a bit of a lottery: if it goes well, you find relevant information; if it goes badly, you confuse the LLM. Another limitation of RAG is that semantic search works for concepts, but company documents are very often full of names, acronyms, and specific references, and that is where search performs poorly. The approach has evolved by giving AI the task of putting these documents back in order. The approach that works best today is to have AI build an "LLM Wiki", that is, to restructure the information into a hypertext system with cross-references. This is a typical application of private AI: giving a single document to AI is already risky; having an entire "ontology" of our information created and giving it to a public AI means handing all our knowledge over to third parties. For this reason, when people talk about transforming their documents into a corporate knowledge base, they are almost always talking about Private AI.

msciab's tweet photo. If you have a large document base, many people think of RAG (retrieval-augmented generation), meaning "teaching" AI your documents and asking questions about them. In reality, it is a fairly outdated concept, for several reasons.

The concept behind RAG is to extract information using a "vector" search, that is, a semantic search, over relevant "fragments" of knowledge, and use them to obtain answers.

The fact is that the foundation of the concept, namely "semantic" search, rarely produces good results, because extracting snippets of sentences on the assumption that they might be relevant is a bit of a lottery: if it goes well, you find relevant information; if it goes badly, you confuse the LLM.

Another limitation of RAG is that semantic search works for concepts, but company documents are very often full of names, acronyms, and specific references, and that is where search performs poorly.

The approach has evolved by giving AI the task of putting these documents back in order. The approach that works best today is to have AI build an "LLM Wiki", that is, to restructure the information into a hypertext system with cross-references.

This is a typical application of private AI: giving a single document to AI is already risky; having an entire "ontology" of our information created and giving it to a public AI means handing all our knowledge over to third parties.

For this reason, when people talk about transforming their documents into a corporate knowledge base, they are almost always talking about Private AI.

Michele Sciabarra

@msciab

13 days ago

Classic programming, which seemed to have gone out the door, is coming back in through the window. This really makes me smile, because there is an impressive number of people convinced that you can do "everything" with AI. But you can't. How does it work? Simple. Let's say you want to automate a task, any task. Here's one I actually use: publishing the texts I write on LinkedIn and X, also translated into English, with an image attached. Doing this manually takes me about half an hour every time. With AI, in theory, it can be done in a minute. You ask it something like: - take this text from here; - fix syntax and grammar errors; - generate a relevant image in this style; - publish it on LinkedIn and X. You can copy and paste these instructions into a chat, or create a `SKILL.md` file to make the AI do it. Usually, though, it does not work well. Besides taking quite a while, it tends to open the browser and make a lot of mess. Above all, using the browser to publish posts is a nightmare. The only serious way to handle this is to automate. To write scripts. You can have the scripts generated too, of course. But the essential element that makes a skill predictable and controllable is exactly this: filling it with scripts that do the operational work. This is the core of skill development, and it is probably the prototype for many programming jobs of the future. Today we do this with Claude skills, but soon everything, really everything, will have an AI engine built in. The real skill will be knowing how to build the skill that implements the function the user wants. And what many "old-school" programmers still do not seem to accept is that we inevitably need to mix informal AI instructions with classic scripts. The real competence lies in deciding what should be handled by a script and what should be delegated to AI. Doing everything with AI is inefficient. Doing everything with scripts is limiting, because you lose the AI's ability to understand context. Understanding the boundary between the two is not obvious at all. Courage: we have changed skin many times already. First we were DOS programmers, then Windows, then Web, then Mobile. Now we will all become AI programmers. Actually, I have already become one. And judging by the number of "AI" labels I see in people's profiles, this is spreading fast.

msciab's tweet photo. Classic programming, which seemed to have gone out the door, is coming back in through the window.

This really makes me smile, because there is an impressive number of people convinced that you can do "everything" with AI. But you can't.

How does it work? Simple. Let's say you want to automate a task, any task. Here's one I actually use: publishing the texts I write on LinkedIn and X, also translated into English, with an image attached.

Doing this manually takes me about half an hour every time. With AI, in theory, it can be done in a minute. You ask it something like:

- take this text from here;
- fix syntax and grammar errors;
- generate a relevant image in this style;
- publish it on LinkedIn and X.

You can copy and paste these instructions into a chat, or create a `SKILL.md` file to make the AI do it. Usually, though, it does not work well. Besides taking quite a while, it tends to open the browser and make a lot of mess. Above all, using the browser to publish posts is a nightmare.

The only serious way to handle this is to automate. To write scripts.

You can have the scripts generated too, of course. But the essential element that makes a skill predictable and controllable is exactly this: filling it with scripts that do the operational work.

This is the core of skill development, and it is probably the prototype for many programming jobs of the future. Today we do this with Claude skills, but soon everything, really everything, will have an AI engine built in. The real skill will be knowing how to build the skill that implements the function the user wants.

And what many "old-school" programmers still do not seem to accept is that we inevitably need to mix informal AI instructions with classic scripts. The real competence lies in deciding what should be handled by a script and what should be delegated to AI.

Doing everything with AI is inefficient.

Doing everything with scripts is limiting, because you lose the AI's ability to understand context.

Understanding the boundary between the two is not obvious at all. Courage: we have changed skin many times already. First we were DOS programmers, then Windows, then Web, then Mobile. Now we will all become AI programmers.

Actually, I have already become one. And judging by the number of "AI" labels I see in people's profiles, this is spreading fast.

Michele Sciabarra

@msciab

14 days ago

Full-stack is not just frontend plus backend. A real application foundation needs every layer working together: compute, deployment, authentication, security, storage, APIs, scaling, caching, observability, rate limiting, and recovery. That is what Nuvolaris provides. It brings together the core pieces required to build, run, and scale modern applications, from Kubernetes and PostgreSQL to Redis, Prometheus, object storage, serverless runtimes, event streaming, load balancing, and CI/CD. This makes Nuvolaris an excellent foundation for vibe coding. You can move fast with AI-assisted development, but still build on an architecture that is production-ready from the start. The result is faster experimentation without giving up the stack discipline needed for real applications. Nuvolaris: the full-stack application platform for building ideas that can actually run.

msciab's tweet photo. Full-stack is not just frontend plus backend.

A real application foundation needs every layer working together: compute, deployment, authentication, security, storage, APIs, scaling, caching, observability, rate limiting, and recovery.

That is what Nuvolaris provides.

It brings together the core pieces required to build, run, and scale modern applications, from Kubernetes and PostgreSQL to Redis, Prometheus, object storage, serverless runtimes, event streaming, load balancing, and CI/CD.

This makes Nuvolaris an excellent foundation for vibe coding.

You can move fast with AI-assisted development, but still build on an architecture that is production-ready from the start. The result is faster experimentation without giving up the stack discipline needed for real applications.

Nuvolaris: the full-stack application platform for building ideas that can actually run.

Michele Sciabarra

@msciab

15 days ago

@jakebrowatzke How much you think is worth the market of a Private Lovable?

Michele Sciabarra

@msciab

15 days ago

@PrajwalTomar_ @ignytlabs You can have also a full stack locally... and scale, without vendor lockin. Have you ever heard of Apache OpenServerless?

Michele Sciabarra

@msciab

15 days ago

@seraleev Sam Altman once warned startups: “Don’t build something we can easily rebuild ourselves, because we will.” That’s why Private AI is one of the few areas where startups can build something the AI giants cannot easily replace.

Michele Sciabarra

@msciab

15 days ago

@LeoSouzaMBL @antonosika @tannerlinsley Maybe you should consider a solution based on Local AI

msciab retweeted

Google Gemma

@googlegemma

15 days ago

Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇

googlegemma's tweet photo. Meet Gemma 4 12B!

A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.

Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇 https://t.co/gf4FZv0WZb

403

12K

msciab retweeted

Cerebras

@cerebras

23 days ago

Sovereign AI means countries can build, deploy, and govern AI on their own terms. 🌐 National capability is the goal. ☑️ Capacity is the prerequisite. 🚀 Speed is the strategic advantage. 🇺🇸 U.S. 🇦🇪 UAE 🇮🇳 India are building with Cerebras. Learn more: https://t.co/6bgLZ4f7XA

cerebras's tweet photo. Sovereign AI means countries can build, deploy, and govern AI on their own terms.

🌐 National capability is the goal.
☑️ Capacity is the prerequisite.
🚀 Speed is the strategic advantage.

🇺🇸 U.S. 🇦🇪 UAE 🇮🇳 India are building with Cerebras.

Learn more: https://t.co/6bgLZ4f7XA https://t.co/dCMsJpmb9N

198

11K

Michele Sciabarra

@msciab

15 days ago

@ValsAI We are testing it and so far works great.

msciab retweeted

Microsoft

@Microsoft

16 days ago

Announced today at #MSBuild: Microsoft unveiled Majorana 2, a next-generation topological quantum chip developed with the help of Microsoft Discovery’s agentic AI. https://t.co/esVcmeWdgh

340

331

191K

Michele Sciabarra

@msciab

15 days ago

We are proud to announce that our parent company, Nuvolaris Inc., has appointed a new CEO: Mirella Di Girolamo. Mirella is a co-founder of Nuvolaris and has previously served as CFO and COO. She is now taking the helm to lead the company through its next phase of growth as momentum continues to accelerate.

msciab's tweet photo. We are proud to announce that our parent company, Nuvolaris Inc., has appointed a new CEO: Mirella Di Girolamo.

Mirella is a co-founder of Nuvolaris and has previously served as CFO and COO. She is now taking the helm to lead the company through its next phase of growth as momentum continues to accelerate.

Michele Sciabarra

@msciab

16 days ago

Every time a software developer says, "Neural networks are unreliable because they are statistical," a mathematician tears their hair out in despair. Judging by the number of bald mathematicians I have met, this seems to happen quite often. The point is that "statistical" does not mean "arbitrary." It means that you cannot predict every single outcome with certainty, but you can measure uncertainty and predict the overall behavior of the system. That is exactly what probability theory is about. Theory and calculation. Got it? Not magic, not mystical inspiration, not a coin toss every time you ask a question. Calculation. Developers are used to reasoning with `if` statements: if A is greater than B, do this; otherwise, do that. But when you need to recognize a dog in a photograph, you do not have a single ready-made variable labeled "dogness: 87." You have millions of pixels, shapes, colors, and relationships between elements. You need to derive an answer from all those signals together. That is precisely what a neural network does: it calculates how plausible it is that the image represents a dog. Under normal conditions, if it has seen enough examples and the image is clear, its answer will be reliable. In ambiguous conditions, or when the image is very different from the data it was trained on, it may make a mistake. This does not make the neural network unpredictable. It makes its errors measurable. You can find out how often it fails, which categories cause more errors, and under which conditions it becomes less reliable. You cannot always anticipate an individual error, but you can study its distribution. Of course, a neural network can be used badly. It can be trained on poor data. It can provide confident but false answers. It can be unsuitable for decisions where the cost of a mistake is too high. But these are concrete problems that can be measured and managed. They are not solved by uttering the word "statistics" as if it were a refutation. The world we live in is also full of probabilistic phenomena. Matter does not suddenly dissolve before our eyes because of that. There may be uncertainty in a single event; across large numbers, extremely robust patterns emerge. So let us stop with the smug anti-neural-network argument that "they are just statistics." Statistics is not the problem. It is precisely the tool that allows us to understand how much we can trust an answer, and when we should be careful.

msciab's tweet photo. Every time a software developer says, "Neural networks are unreliable because they are statistical," a mathematician tears their hair out in despair. Judging by the number of bald mathematicians I have met, this seems to happen quite often.

The point is that "statistical" does not mean "arbitrary." It means that you cannot predict every single outcome with certainty, but you can measure uncertainty and predict the overall behavior of the system.

That is exactly what probability theory is about. Theory and calculation. Got it? Not magic, not mystical inspiration, not a coin toss every time you ask a question. Calculation.

Developers are used to reasoning with `if` statements: if A is greater than B, do this; otherwise, do that. But when you need to recognize a dog in a photograph, you do not have a single ready-made variable labeled "dogness: 87." You have millions of pixels, shapes, colors, and relationships between elements. You need to derive an answer from all those signals together.

That is precisely what a neural network does: it calculates how plausible it is that the image represents a dog. Under normal conditions, if it has seen enough examples and the image is clear, its answer will be reliable. In ambiguous conditions, or when the image is very different from the data it was trained on, it may make a mistake.

This does not make the neural network unpredictable. It makes its errors measurable. You can find out how often it fails, which categories cause more errors, and under which conditions it becomes less reliable. You cannot always anticipate an individual error, but you can study its distribution.

Of course, a neural network can be used badly. It can be trained on poor data. It can provide confident but false answers. It can be unsuitable for decisions where the cost of a mistake is too high. But these are concrete problems that can be measured and managed. They are not solved by uttering the word "statistics" as if it were a refutation.

The world we live in is also full of probabilistic phenomena. Matter does not suddenly dissolve before our eyes because of that. There may be uncertainty in a single event; across large numbers, extremely robust patterns emerge.

So let us stop with the smug anti-neural-network argument that "they are just statistics." Statistics is not the problem. It is precisely the tool that allows us to understand how much we can trust an answer, and when we should be careful.

Michele Sciabarra

@msciab

16 days ago

@Michaelzsguo His work is the key to making Private AI accessible to everyone. Kudos.

131

msciab retweeted

Michael Guo

@Michaelzsguo

17 days ago

When the creator of Redis starts thinking about KV cache, pay attention. antirez is Salvatore Sanfilippo, the Sicilian programmer best known for creating Redis. But “creator of Redis” is almost too small a label. Before Redis, he was already an old-school systems hacker. He built hping, worked in network security, and invented the idle scan technique. This was the packet-level, C-programming, Unix-hacker world. Then Redis happened. The origin was not glamorous. He was building LLOOGG, a real-time web analytics service, and needed something faster and simpler than the tools he had. So he created Redis. That is very antirez. Start with a real bottleneck. Avoid unnecessary abstraction. Expose the right primitive. Make it fast enough that people rethink the category. Redis did not win because it looked like a traditional database. It won because it gave developers direct access to useful data structures: strings, lists, hashes, sets, sorted sets, streams, pub/sub. It made memory programmable. That is why his return to local AI is so interesting. With ds4, or DwarfStar 4, antirez is not just building “another local inference engine.” He is asking a very Redis-like question: What is the real primitive here? For LLMs, one answer is obvious: KV cache. Most people treat KV cache as an implementation detail. It lives in RAM or HBM, grows with context, and quietly becomes the bottleneck. antirez looks at DeepSeek V4 Flash, compressed KV cache, modern MacBook SSDs, and says: maybe KV cache should not only live in RAM. His phrase is perfect: “The KV cache is actually a first-class disk citizen.” That one sentence is the whole story. If Redis made in-memory data structures feel like application infrastructure, ds4 is exploring whether local LLM state can become durable infrastructure too. Prefill once. Persist the cache. Resume later. Let long-running agents reuse expensive context instead of rebuilding everything from scratch. This matters because coding agents are not normal chatbots. They carry huge system prompts, tool definitions, repo context, prior steps, and long task histories. If every request has to resend and recompute the entire conversation, local inference will always feel fragile and wasteful. ds4 attacks that directly. It is a deliberately narrow engine for DeepSeek V4 Flash, focused on Metal and CUDA, high-end personal machines, special quantization, long context, HTTP API, GGUF files crafted for the engine, official-logit validation, and agent integration. There is also a funny and very current detail: he openly says ds4 was built with strong assistance from GPT 5.5, with humans leading ideas, testing, and debugging. That is very 2026. A legendary C programmer using an AI coding partner to build a local AI engine, so other coding agents can run locally with persistent KV state. It sounds recursive because it is. And he still has the same builder energy. After ds4 took off, he wrote that the first week felt like early Redis again, with 14-hour workdays, chaos, and excitement. That is the part I like most: a true old-school builder.

Michaelzsguo's tweet photo. When the creator of Redis starts thinking about KV cache, pay attention.

antirez is Salvatore Sanfilippo, the Sicilian programmer best known for creating Redis.

But “creator of Redis” is almost too small a label.

Before Redis, he was already an old-school systems hacker. He built hping, worked in network security, and invented the idle scan technique. This was the packet-level, C-programming, Unix-hacker world.

Then Redis happened.

The origin was not glamorous. He was building LLOOGG, a real-time web analytics service, and needed something faster and simpler than the tools he had. So he created Redis.

That is very antirez.

Start with a real bottleneck.
Avoid unnecessary abstraction.
Expose the right primitive.
Make it fast enough that people rethink the category.

Redis did not win because it looked like a traditional database. It won because it gave developers direct access to useful data structures: strings, lists, hashes, sets, sorted sets, streams, pub/sub.

It made memory programmable.

That is why his return to local AI is so interesting.

With ds4, or DwarfStar 4, antirez is not just building “another local inference engine.”

He is asking a very Redis-like question:

What is the real primitive here?

For LLMs, one answer is obvious: KV cache.

Most people treat KV cache as an implementation detail. It lives in RAM or HBM, grows with context, and quietly becomes the bottleneck.

antirez looks at DeepSeek V4 Flash, compressed KV cache, modern MacBook SSDs, and says: maybe KV cache should not only live in RAM.

His phrase is perfect:

“The KV cache is actually a first-class disk citizen.”

That one sentence is the whole story.

If Redis made in-memory data structures feel like application infrastructure, ds4 is exploring whether local LLM state can become durable infrastructure too.

Prefill once.
Persist the cache.
Resume later.
Let long-running agents reuse expensive context instead of rebuilding everything from scratch.

This matters because coding agents are not normal chatbots.

They carry huge system prompts, tool definitions, repo context, prior steps, and long task histories. If every request has to resend and recompute the entire conversation, local inference will always feel fragile and wasteful.

ds4 attacks that directly.

It is a deliberately narrow engine for DeepSeek V4 Flash, focused on Metal and CUDA, high-end personal machines, special quantization, long context, HTTP API, GGUF files crafted for the engine, official-logit validation, and agent integration.

There is also a funny and very current detail: he openly says ds4 was built with strong assistance from GPT 5.5, with humans leading ideas, testing, and debugging.

That is very 2026.

A legendary C programmer using an AI coding partner to build a local AI engine, so other coding agents can run locally with persistent KV state.

It sounds recursive because it is.

And he still has the same builder energy. After ds4 took off, he wrote that the first week felt like early Redis again, with 14-hour workdays, chaos, and excitement.

That is the part I like most: a true old-school builder.

210

113

13K

msciab retweeted

Stable Diffusion Tutorials @SD_Tutorial

17 days ago

NVIDIA 🧐 just dropped ! Cosmos3 Super (Image2Video) -Given one input image and text instructions -generate temporally coherent video sequences -consistent with the provided visual content HF repo:👇 https://t.co/sifHpVyUG5

205

179

10K

Michele Sciabarra

@msciab

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users