I’m happy to share my new and highest AWS Certified Solutions Architect – Professional from Amazon Web Services (AWS) Web Services! it’s the 1002nd Amazon Web Services (AWS) certificate from T-Systems International international #peoplemakeithappen https://t.co/7gD0rMvRUi
Mein Artikel als Coverstory in der nächsten ix 05.20025 #EUDI -Wallet: Das ist der aktuelle Stand der digitalen europäischen Brieftasche https://t.co/BUozg827WS
RAG vs. CAG, clearly explained!
RAG is great, but it has a major problem:
Every query hits the vector DB. Even for static information that hasn't changed in months.
This is expensive, slow, and unnecessary.
Cache-Augmented Generation (CAG) addresses this issue by enabling the model to "remember" static information directly in its key-value (KV) memory.
In fact, you can combine RAG and CAG for the best of both worlds.
Here's how it works:
RAG + CAG splits your knowledge into two layers:
↳ Static data (policies, documentation) gets cached once in the model's KV memory
↳ Dynamic data (recent updates, live documents) gets fetched via retrieval
This gives faster inference, lower costs, and less redundancy.
The trick is being selective about what you cache.
Only cache static, high-value knowledge that rarely changes. If you cache everything, you'll hit context limits. Separating "cold" (cacheable) and "hot" (retrievable) data keeps this system reliable.
You can start today. OpenAI and Anthropic already support prompt caching in their APIs.
I have shared my recent article on prompt caching below if you want to dive deeper.
👉 Over to you: Have you tried CAG in production yet?
A harnessed LLM agent.
Most people picture this as a model with tools bolted on. The real architecture inverts that relationship.
The model itself is deliberately thin. Intelligence gets pushed outward, and the harness composes it at runtime.
Three dimensions orbit the harness core:
𝗠𝗲𝗺𝗼𝗿𝘆 holds state the model shouldn't carry in weights or context. Working context, semantic knowledge, episodic experience, and personalized memory each have their own lifecycle.
𝗦𝗸𝗶𝗹𝗹𝘀 hold procedural knowledge. Operational procedures, decision heuristics, and normative constraints specialize the general model per task.
𝗣𝗿𝗼𝘁𝗼𝗰𝗼𝗹𝘀 hold the interaction contracts. Agent-to-user, agent-to-agent, and agent-to-tools are three distinct surfaces with their own failure modes.
Between the core and these modules sit the mediators: sandboxing, observability, compression, evaluation, approval loops, and sub-agent orchestration. They govern how the harness reaches out and how state flows back in.
The useful question this framing unlocks: for any new capability, where should it live? Stable knowledge goes to memory, learned playbooks go to skills, communication contracts go to protocols, loop governance goes to the mediators.
Harness design becomes a question of what to externalize, and how to mediate it.
I'm building a minimal agent harness from scratch. Didactic, easy to read, no magic. Open-sourcing it soon. Stay tuned.
„AI eats Software“Warum SaaS-Aktien an der Wall Street crashen | Der Software-Sektor steht am Beginn seiner vielleicht größten Umbruchphase seit der Cloud-Revolution. KI wird Software dabei nicht sofort einfach ersetzen, sie dürfte sie aber neu definieren https://t.co/tHPZg1xXaC
PostgreSQL RPM repository now supports multiple RHEL minor versions YUM Repository Red Hat’s decision to ship a major OpenSSL update (3.2 → 3.5) together with the RHEL 10.1 and 9.7 releases caused unexpected breakage for users of Rocky Linux, AlmaLinux https://t.co/1kXYJe2ShI