Everyone's talking about AI agents.
No one's showing you HOW they
actually work.
This diagram = the entire playbook
Skills
MCP
Subagents
Hooks
Tools
agentic revolution has an architecture
And here it is
.
.
.
image credits: respected owner
Here's our forecasted Claude spend, broken down by team.
Engineering is highest at $3.1k/person/month.
No surprise. The interesting part is who's right behind them.
Proud that @usepylon is one of the fastest-growing vendors on @brexHQ 📈
Earlier this year we had another SKU cross > $1M ARR on its own, and we're working on even more bets to keep scaling our products.
From Account Intelligence to some very exciting AI that I can't say much more about yet 😅, there's a lot coming up soon.
Shoutout to our team for their incredible momentum and work to get us here.
Best accounts to follow from each frontier lab to stay constantly up to date
Anthropic
@karpathy
- must-follow account for AI; recently joined Anthropic
@bcherny
- Claude Code creator, always shares great tips
@trq212
- also a Claude Code developer; writes amazing articles on CC
OpenAI
@polynoamial
- works on reasoning research, shares a lot of technical details
@gabriel1
- Sora developer, great career path
@jxnlco
- works on dev experience, shares a lot about Codex
Google AI
@OfficialLoganK
- all the major Google Gemini and AI Studio updates
@ammaar
- product and design; shares great things about vibe-coding in Google AI Studio
@fofrAI
- cool use cases for generative models
Cursor
@leerob
- the loudest voice behind Cursor updates
@ericzakariasson
- shares great insights on using Cursor
@mntruell
- Cursor’s CEO; major releases and usage updates
xAI
@milichab
- recently joined xAI, shares updates on Grok
@skcd42
- also covers major Grok releases
@ai_explorer25
- covers all ai content and free resources
This is a *way* bigger deal than it seems...
Frontier AI companies will *never* own the frontier again
I kid you not... I've been waiting for someone to show this result for like 4 years... this is a huge deal.
The short reason: combinations of models will *always* outperform individual models
The long reason: this is the gateway to a million times more data... and huge leaps in compute efficiency.
The AI scaling laws always win.
More in article below 👇
I spent 12 hours this weekend installing https://t.co/aq7DBz7en5 on every agent.
here's everything i learned:
→ claude chat - 90% of the world uses this or chatgpt. It loves MCPs. And it's great for one-shot tasks (headline generator, find journalists), but it can't run the complex scheduled workflows. big gap to to solve for here.
→ claude cowork - positioned as "claude code for business," but in practice it's stripped down with more limitations than i expected. everything sandboxed and you can't persist to local machines easily.
→ local agents (claude code, codex, hermes, openclaw) - These agents love CLIs. and they have full coverage. real workflow orchestration. scheduled runs that actually persist. where most of the power users' agents live today.
→ chatgpt - the worst experience for skill-based workflows lol. skills are still gated to business and enterprise plans, so plus/pro users can't load newsjack natively yet
I just shipped Newsjack v0.1.11 based on all these learnings:
- rewrote the install flow for every platform with this giant matrix.
- install to Claude and Cowork as a Claude plugin.
- one-line install for local agents now works on Windows.
- a rewritten getting-started guide for the agent to follow.
- medialyst connection now uses a clean API endpoint
the punchline: you no longer need to read the setup guide.
paste this prompt into the AI agent of your choice:
> "install https://t.co/5ViWqne7gA for me"
agent reads the setup guide. detects your platform. picks the right install path. walk you through the whole setup.
welcome to the world of agent onboarding - this is now the minimum bar every product is expected to clear.
Historically, China’s industrial policies long favored these state-backed national champions.
But look at who dominates today: BYD, Geely, Chery, NIO, XPeng, Li Auto, Seres, Xiaomi, and Leapmotor.
Almost none of these top performers (the only exception being Changan) were the central government's designated champions. In 2024, half of the top-five best-selling EV makers are private. In 2025, there are more. How did this happen? (3/15)
I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true:
— As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
— Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.)
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.”
— In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
— In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community.
— The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority.
— Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.
This is the correctly nuanced take
The constraints of export control is fostering a new generation of resourceful, creative, and probably more patriotic AI developers, engineers, and researchers
Major life hack:
DeepSeek in the Claude Code harness can also build and drive workflows, at a fraction of the cost and Opus 4.6/7 quality.
I've got it running over 250 subagents in a workflow in adversarial reviews.
Pennies on the dollar.
Use my tool "Deep-Claude"
Agents get expensive when one frontier model does everything: read raw logs, recognize patterns, plan the fix. The split that works: the big model orchestrates, fine-tuned SLMs work as its tools. Cheap tokens digest raw data into JSON; expensive tokens only do the reasoning.
Claude Code fully dissected!
Researchers from UCL reverse-engineered the leaked Claude source. What they found changes how you should think about agent design.
Only 1.6% of the codebase is AI decision logic.
The other 98.4% is operational infrastructure. Permission gates, tool routing, context compaction, recovery logic, session persistence. The model reasons. The harness does everything else.
This is the opposite of what most agent frameworks do today.
LangGraph routes model outputs through explicit state machines. Devin bolts heavy planners onto operational scaffolding. Claude Code gives the model maximum decision latitude inside a rich deterministic harness, and invests all its engineering effort in that harness.
The core loop is a simple while-true. Call model, run tools, repeat.
But the systems around that loop are where the real design lives:
A permission system with 7 modes and an ML classifier. Users approve 93% of prompts anyway, so the architecture compensates with automated layers instead of adding more warnings.
A 5-layer context compaction pipeline. Each layer runs only when cheaper ones fail. Budget reduction, snip, microcompact, context collapse, auto-compact.
Four extension mechanisms ordered by context cost. Hooks (zero), skills (low), plugins (medium), MCP (high). Each answers a different integration problem.
Subagents return only summary text to the parent. Their full transcripts live in sidechain files. Agent teams still cost roughly 7x the tokens of a standard session.
Resume does not restore session-scoped permissions. Trust is re-established every session. That friction is the point.
The bet behind all of this is simple. As frontier models converge on raw coding ability, the quality of the harness becomes the differentiator, not the model.
Paper: Dive into Claude Code (arXiv:2604.14228)
We've shared an article on Agent Harness and what every big company is building.
Read it below.
The 8-step order I use for shipping AI agents in 2026:
1. Filter noisy tool outputs
2. Load tools only when needed
3. Clean cached history before reusing it
4. Compress long logs and terminal outputs
5. Store memory outside the context window
6. Compact manually around 40%
7. Add retrieval behind the system
8. Keep autocompact as the last resort
What I like about this order is that each step reduces pressure on the context window before the next layer gets added.
The result is not just lower token usage.
The agent stays much more stable during long sessions and keeps track of what it’s actually doing.
That’s becoming a very important skill in agent engineering now.
Not just building agents.
Managing their context properly.
🚨 Stop asking "which AI is the best."
That's the wrong question. The pros in 2026 don't pick one model — they run a stack. Five models, five superpowers, each doing the one job it's actually built for.
Here's exactly how I use AI now 👇
1. Claude Opus → the Reasoning Engine 🧠
This is my thinking partner. Complex multi-step problems, long-document analysis, serious coding, precision enterprise work. When the cost of being wrong is high, this is what I reach for. Huge context, careful step-by-step reasoning, the most reliable "thinker" in the stack.
2. GPT-5.5 → the Execution Engine ⚙️
When I need to actually DO something agentic — autonomous coding pipelines, multi-step tool chains, computer use, workflow automation. Best-in-class at planning a sequence of tool calls and persisting through long sessions without losing the plot.
3. Perplexity → the Knowledge Engine 🔍
Anything that needs a source. Research, fact-checking, due diligence, market and legal analysis. Every claim is numbered and linked to a live source, and Deep Research pulls from dozens of pages per query. No "trust me" — receipts only.
4. Meta AI → the Workflow Engine 📱
The everyday assistant that lives where I already am — WhatsApp, Instagram, Messenger, even the glasses. Image analysis, shopping, travel planning, quick voice-and-vision tasks on mobile. Free, fast, everywhere.
5. Microsoft Copilot → the Distribution Layer 🏢
This is the one nobody talks about but every company runs on. AI baked directly into Word, Excel, PowerPoint, Teams, and Outlook. Zero context-switching, grounded in your org's real emails and files, wrapped in enterprise governance. It's not the smartest — it's the most embedded.
The mental model that changed everything for me:
🧠 Reasoning → Claude Opus
⚙️ Execution → GPT-5.5
🔍 Knowledge → Perplexity
📱 Workflow → Meta AI
🏢 Distribution → Microsoft Copilot
Amateurs argue about the leaderboard.
Professionals build the stack.
A senior Google engineer just dropped a 421-page doc called Agentic Design Patterns.
Every chapter is code-backed and covers the frontier of AI systems:
→ Prompt chaining, routing, memory
→ MCP & multi-agent coordination
→ Guardrails, reasoning, planning
This isn’t a blog post. It’s a curriculum. And it’s free.
Really excited to open source a new project: Omnigent, a meta-harness for AI agents.
It lets you build multi-agent coding and custom agents, sitting above Claude Code, Codex, Pi, and agent SDKs to let you compose them. It also adds live collaboration and rich control policies.
If you want to get good at AI engineering (in 2026), learn these concepts:
1 LLM Evals Explained
↳ https://t.co/nv3Ol8W53p
2 Design Knowledge Q & A System
↳ https://t.co/9ymm6mtHug
3 How OpenClaw Works
↳ https://t.co/eHRWegcsf8
4 AI Agent Workflow
↳ https://t.co/JvnPd9773A
5 How MCP Works
↳ https://t.co/wgf8gHnnkn
6 Design AI Chat Assistant
↳ https://t.co/nNWq3onTnW
7 How RAG Works
↳ https://t.co/cGmunPTUlb
8 Agentic Patterns Explained
↳ https://t.co/8YdBBWvTj1
9 AI Coding Workflow 101
↳ https://t.co/paIf9ksIU9
10 Machine Learning System Design 101
↳ https://t.co/9MkHcLb5e0
11 Multi-Agent Architecture Explained
↳ https://t.co/rS5QQS7Jln
12 How AI Agents Work
↳ https://t.co/JvnPd9773A
13 How Vector Databases Work
↳ https://t.co/FVxan8xHH3
14 AI Agents: Memory, State & Consistency
↳ https://t.co/v8H7O00jub
15 AI Agents Design
↳ https://t.co/tk3zkCjRvg
16 Context Engineering 101
↳ https://t.co/OMkiZhkODL
17 What is Reinforcement Learning
↳ https://t.co/AVpl9j1oit
18 LLM Concepts - A Deep Dive
↳ https://t.co/5lCKxq2g4N
What else should make this list?
===
👋 PS - Want my System Design Playbook (for free)?
Join my newsletter with 201K+ software engineers now:
→ https://t.co/ByOFTtOihX
===
💾 Save & RT to help others get good at AI engineering.
👤 Follow @systemdesignone + turn on notifications.
Ant Group just published a 227-repo map for builders trying to read the agentic AI stack.
The painful part is not finding one more agent repo.
It is knowing whether the thing you are evaluating belongs to runtime, tools, evals, memory, serving, gateways, or models.
agentic-ai-landscape turns that mess into an ecosystem view instead of another bookmarks folder.
It uses a three-layer architecture: Agent Infra, Model Infra, and Large Models.
The useful part:
> Agent Infra covers applications, frameworks, runtimes, and tools
> Model Infra covers data, training, serving, and deployment
> Large Models sit as the foundation layer
> CSV tracks 227 noteworthy projects with categories and repo metadata
> OpenDigger data supports the community-vitality view
Caveat: public repo, but no license file was visible in the checkout.
If you were choosing an agent stack today, which layer would you audit first?
Link in the reply 👇