I'm launching "The Technomist," a newsletter exploring how technology and business intersect, covering topics from product/idea discovery to AI strategies.
Subscribe if you'd like to follow along: https://t.co/DhqP0Uvagy
First post coming soon!
#tech#business
AI Scrolls doesn't try. Sparse eras collapse into count badges, dense years fan out when you zoom. The axis adapts to the data, not the other way around.
It's a utility to help you observe the patterns. Google Trends shows search interest across correlated with whatโs happening into each slot across different topic, the Ngrams goes a bit deeper, tracking how often terms like "artificial intelligence" and "machine learning" appeared in published books since 1800. You can literally see the AI winters: interest peaks in the 70s, drops through the 80s, flatlines, then climbs again.
the next wave.
More on the thinking behind it and how it came to be:
https://t.co/IwBlWKatyd
I built ๐๐ ๐ฆ๐ฐ๐ฟ๐ผ๐น๐น๐, an interactive timeline of AI history.
I started this as a ๐ณ๐๐ป ๐๐ถ๐ฑ๐ฒ-๐ฝ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ a while back and kept evolving it because most AI timelines I found had one of two problems: they either stopped around 2023, or they were useful as reference lists but not very good at showing patterns over time.
AI Scrolls currently includes 359 events, from ๐๐ฟ๐ถ๐๐๐ผ๐๐น๐ฒ'๐ syllogism in ๐ฏ๐ด๐ฐ ๐๐ย to recent work on agents, context engineering, and harness engineering in 2026.
I added ๐๐ผ๐ผ๐ด๐น๐ฒ ๐ง๐ฟ๐ฒ๐ป๐ฑ๐ data for "artificial intelligence" and 14 related sub-trends, including agents, RAG, prompt engineering, context engineering, harness engineering, and vibe coding. You'll notice that search interest in "artificial intelligence" stayed relatively flat from 2004 to 2022, then jumped sharply after ChatGPT (no surprise by now ๐). Prompt engineering appears to have peaked earlier, while AI agents is still climbing.
๐ ๐ ๐๐ฒ๐ฟ๐ฒ ๐ถ๐ ๐๐ต๐ฒ ๐น๐ถ๐ป๐ธ ๐ณ๐ผ๐ฟ ๐๐ผ๐ ๐๐ผ ๐ฒ๐ ๐ฝ๐น๐ผ๐ฟ๐ฒ: https://t.co/qI6dB5xobv
An AI agent deleted an entire production database in 9 seconds last month. No rollback. No recovery.
Every sandboxing approach falls into 3 modes. Knowing which you need matters more than which solution you pick. We validated OpenShell across all 3.
๐ https://t.co/qtnSsSuhFq
RAG systems rank by relevance, not authorization. A query from one tenant can surface another tenant's data simply because it scores highest.
Red Hat's @franciscojarceo co-authored a paper on fixing this.
MLOps Community, June 11, 1:30 PM EDT (virtual)
https://t.co/YmWneZFH1R
Anthropic's self-hosted sandboxes for Claude, now running on OpenShell on Red Hat AI.
Claude reasons in the cloud. Code executes on your infrastructure, inside an OpenShell sandbox: deny-all by default, per-binary network policy, credential isolation.
A five-part ๐งต:
Instruction-based malware is the new threat. โ ๏ธ 300+ malicious skills were recently found in the #OpenClaw marketplace. No bad code, just adversarial inputs.
Protect your stack from semantic malware with #RedHat's #AI security layers. #RHSummit
https://t.co/FG3h4vKAIk
Most AI agents say "done" when the output is fine, not good. They can fix their own mistakes when someone points them out. Nobody is pointing them out automatically.
๐๐ป๐๐ต๐ฟ๐ผ๐ฝ๐ถ๐ฐ built this, but it only works inside their managed platform. I wanted the same pattern with an ๐ผ๐ฝ๐ฒ๐ป ๐๐ผ๐๐ฟ๐ฐ๐ฒ stack.
I am calling it ๐ผ๐๐๐ฐ๐ผ๐บ๐ฒ ๐น๐ผ๐ผ๐ฝ๐: A rubric-graded quality gate between the agent's output and the user's inbox. A separate judge model scores the result, and the agent revises until the rubric is satisfied or the iteration budget runs out.
To build this loop, I used ๐ข๐๐ซ (https://t.co/m6C5Ywz1eC) for inference and ๐ ๐๐ณ๐น๐ผ๐ (https://t.co/wnREtKrONy) for evaluation.
OGX gave me a single endpoint across all agentic API surfaces (chat-completions, responses API, messages API, Interactions API), integrates with self-hosted runtimes like vLLM and natively supports retrieval and document processing workflows with various DB providers.
An outcome loop makes ๐๐๐ผ ๐บ๐ผ๐ฑ๐ฒ๐น ๐ฐ๐ฎ๐น๐น๐ ๐ฝ๐ฒ๐ฟ ๐ถ๐๐ฒ๐ฟ๐ฎ๐๐ถ๐ผ๐ป (agent + judge), and ๐ข๐๐ซ routes both through the same base URL, so you can self-host one and use a hosted provider for the other.
๐ ๐๐ณ๐น๐ผ๐ turns rubrics into versioned metrics that log scores, justifications, and artifacts automatically, so you can actually see which criteria fail and whether your rubric is calibrated.
Some learnings along the way:
โก๏ธ Build the judge before the loop. A weak judge lets bad output through, a harsh one wastes iterations. The judge is the system.
โก๏ธ Stop at the first passing score. Extra iterations drift; models rewrite good sections to satisfy criteria they already passed.
โก๏ธ Start the rubric loose, tighten from the data. You won't know which criteria matter until you see real outputs fail.
This is part 3 of my continuous learning for agents series: https://t.co/oZ5rh55gSE
Link to the post: https://t.co/L5yLao1UAH
We're way past the point where AI is just the model. There's more at stake now!
Wrote a ๐๐ต๐ฟ๐ฒ๐ฒ-๐ฝ๐ฎ๐ฟ๐ ๐๐ฒ๐ฟ๐ถ๐ฒ๐ breaking down what goes into building a ๐๐ต๐ผ๐น๐ฒ ๐๐ ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐:
โก๏ธ ๐ฃ๐ฎ๐ฟ๐ ๐ญ covers the framework, building off Moore's whole product model and introducing a tweaked version.
โก๏ธ ๐ฃ๐ฎ๐ฟ๐ ๐ฎ covers compound systems, agentic loops, and the architecture patterns emerging right now.
โก๏ธ ๐ฃ๐ฎ๐ฟ๐ ๐ฏ tackles the question: if everyone has access to the same models, what makes your product defensible?
https://t.co/ZOwON73INz
AI agents need to run shell commands, write to disk, and make network calls to do real work. Security teams look at that access list and see a breach waiting to happen. Give an agent those permissions and things go wrong fast, a ๐ต๐ฎ๐น๐น๐๐ฐ๐ถ๐ป๐ฎ๐๐ถ๐ผ๐ป turns ๐ฟ๐บ -๐ฟ๐ณ on the wrong directory, a bad tool call POSTs credentials to an external endpoint.
And the threat model has a new entry: ๐๐ฒ๐บ๐ฎ๐ป๐๐ถ๐ฐ ๐บ๐ฎ๐น๐๐ฎ๐ฟ๐ฒ (not executable payloads), malicious instructions hidden in natural language that agents follow because they read documentation the same way they read prompts.
๐ก๐ผ ๐๐ถ๐ป๐ด๐น๐ฒ ๐ฐ๐ผ๐ป๐๐ฟ๐ผ๐น ๐ต๐ฎ๐ป๐ฑ๐น๐ฒ๐ ๐๐ต๐ถ๐. A firewall does not stop an agent that was told to write secrets to a file it is allowed to touch. A guardrail does not help if the agent never hits the model with the malicious content. ๐ฌ๐ผ๐ ๐ป๐ฒ๐ฒ๐ฑ ๐น๐ฎ๐๐ฒ๐ฟ๐ (๐ฑ๐ฒ๐ณ๐ฒ๐ป๐๐ฒ ๐ถ๐ป ๐ฑ๐ฒ๐ฝ๐๐ต), and each layer must assume the one above it has already failed.
Here is a ๐๐ถ๐ -๐น๐ฎ๐๐ฒ๐ฟ ๐ณ๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ for how we are building the agent's secure runtime with ๐ฅ๐ฒ๐ฑ ๐๐ฎ๐ ๐๐:
โก๏ธ ๐๐ฎ๐ฟ๐ฑ๐ฒ๐ป๐ฒ๐ฑ ๐ฝ๐น๐ฎ๐๐ณ๐ผ๐ฟ๐บ locks down the base. Security context constraints, SELinux, and DNS-based egress filtering before the agent even starts.
โก๏ธ Agent pods run inside ๐น๐ถ๐ด๐ต๐๐๐ฒ๐ถ๐ด๐ต๐ ๐ฉ๐ ๐ฟ๐๐ป๐๐ถ๐บ๐ฒ๐ (Kata Containers), not just containers. Container escape hits a hardware boundary.
โก๏ธ An agent "๐๐ต๐ฒ๐น๐น" (๐ข๐ฝ๐ฒ๐ป๐ฆ๐ต๐ฒ๐น๐น) enforces per-binary network policies and kernel-level filesystem allowlists. The agent binary can only reach the endpoints and paths you explicitly permit.
โก๏ธ Conversation-layer ๐ด๐๐ฎ๐ฟ๐ฑ๐ฟ๐ฎ๐ถ๐น๐ (NeMo Guardrails) intercept prompt injections and filter unsafe outputs before the model processes them.
โก๏ธ Continuous ๐ฟ๐ฒ๐ฑ ๐๐ฒ๐ฎ๐บ๐ถ๐ป๐ด (Garak) runs 120+ adversarial probes against your deployed agents. You find the gaps before someone else does.
โก๏ธ ๐ช๐ผ๐ฟ๐ธ๐น๐ผ๐ฎ๐ฑ ๐ถ๐ฑ๐ฒ๐ป๐๐ถ๐๐ ๐ฎ๐ป๐ฑ ๐๐ฟ๐ฎ๐ฐ๐ถ๐ป๐ด give every agent a cryptographic identity (SPIFFE) and full OpenTelemetry traces. When something goes wrong, you know which agent did what, when, and why.
The design principle assumes breach at every layer and contains the blast radius at the next one.
https://t.co/BVctOJJ0I8
Karpathy and Tobi Lutke built the same loop independently. Point an AI agent at code, give it a score to chase, let it run experiments overnight. One got a better model. The other got a 53% speedup on a 20-year-old codebase.
I generalized the pattern into a tool that works for any domain. Pointed it at a RAG search engine, got 14 experiments and a 9.3% improvement while I did other things. The real work isn't the loop. It's writing good evals.
https://t.co/GmVXa7SAuO
Your AI coding agent makes the same mistake every session. You correct it, it adapts, the session ends, and tomorrow it's forgotten everything.
I built a system that captures corrections, figures out which skill caused the failure, and checks whether I already fixed it. The key insight: an agent that remembers everything learns nothing. An agent that remembers only what you choose to teach it gets better every week.
https://t.co/UNtWjLaF8y
Your agent works on your laptop. But does it have identity? Isolation? Audit trails? Observability with agent tracing? Can you prove to compliance what tools it called and why?
Most teams can't.
That's the gap Red Hat AI closes with Bring Your Own Agent: security, governance, observability, and tool-level authorization around any agentic runtime, framework, or application without touching code.
Here's how to operationalize "Bring Your Own Agent" on Red Hat AI, the OpenClaw edition: https://t.co/JEjP5C7jjD
Most CLAUDE.md files I've seen are way too long. The model can only reliably follow ~150 instructions, and Claude Code's system prompt already uses ~50 of those. If Claude keeps ignoring your rules, your instruction budget is probably overdrawn. Wrote about how to fix it.
https://t.co/x5NohSS7no
Spent the weekend building a tool for solving the coding agents' sprawl problem. Introducing โจ aimux โจ
If you're running multiple coding agents like Claude, Codex, Gemini, etc, you know the pain: which session is stuck? What did it do? How do I debug it? How much did it cost?
aimux is a single-binary TUI that gives you one view across all your AI coding agents. Discovery, traces, cost tracking, annotations + labels (for evals), and OTEL export! No daemons, no hooks, no modifications to your tools. Integrates with MLFlow and is easily extensible as well.
Multiplex your AI agents. Trace, launch (built-in for different coding agents), export. Never leave the terminal.
Install: brew install zanetworker/aimux/aimux
Repo: https://t.co/vyaU4neA43
Site: https://t.co/2i43XUtNbB
AI scales execution to near-zero cost. But verifying that output stays biologically bounded. The bottleneck is not intelligence (not anymore, that's becoming abundant now), it is and will be human verification bandwidth.
Full post here: https://t.co/3Dh7blx85g