Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH
Congrats @mastra on the Mastra 1.0 release!
@AgentGraphAI has been using Beta 1.0 in production and it’s been a massive improvement to our agentic marketplace layer.
🎉 🙌
Why prompt caching matters
@smthomas3 was on a call today with a unicorn startup that shipped an in-app agent using Mastra.
They said that a single user cost them over $1k in tokens in a single session (using Sonnet 4.5).
The narrative is shifting from "AI Agents" to "Agent Engineering"
High-quality, production-grade agents are build Agent Engineers to drive economic value.
At @AgentGraphAI, we believe the agentic web will built by those who combine human-centered design with agent engineering.
This post from @langchain captures "why @AgentGraphAI" really well.
Agent engineering is a fundamentally new discipline. The best Agent Engineers blend these things well:
✅ Product - know the use case and how to do evals
✅ Engineering - tools, context, observability at runtime
✅ Data Science - rapid iteration on agent data streams
@AgentGraphAI is building a network of AI Engineers who excel in all three of these areas.
find elite agent engineers to work with at https://t.co/iUnm8GzGP7
At @AgentGraphAI we are excited to see agent interoperability standards, like x402, start to take shape.
Several AgentGraph buyers are now submitting use cases that combine EIP-8004, MCP, and x402 to create a new class of agentic web-native custom agents.
while x402 is hot, i encourage people to try out all the cool use cases
Like Penny for your thoughts -- get interviewed by an AI agent that helps you generate unique insights that you can charge other users to access
glimpse at the future of consulting -- i made one for x402!
The next wave of software will be agentic. The meaning of "full stack" has shifted to include the AI stack.
Models, tools, memory, RAG, agentic RAG.
AgentGraph will be attending the first TypeScript AI conference (hosted by @mastra) on November 6th.
Great post!
Really like the term “Deep Agents” as a way to lean into the complexity of multi-agent systems.
Just as microservices helped scale cloud-based infrastructure, multi-agent systems will scale the agentic web and address more complex use cases.
Most agents today are shallow.
They easily break down on long, multi-step problems (e.g., deep research or agentic coding).
That’s changing fast!
We’re entering the era of "Deep Agents", systems that strategically plan, remember, and delegate intelligently for solving very complex problems.
We at @dair_ai and other folks from LangChain, Claude Code, as well as more recently, individuals like Philipp Schmid, have been documenting this idea.
Here’s roughly the core idea behind Deep Agents (based on my own thoughts and notes that I've gathered from others):
// Planning //
Instead of reasoning ad-hoc inside a single context window, Deep Agents maintain structured task plans they can update, retry, and recover from. Think of it as a living to-do list that guides the agent toward its long-term goal. To experience this, just try out Claude Code or Codex for planning. The results are significantly better once you enable it before executing any task. I have also written recently on the power of brainstorming for longer with Claude Code, and this shows the power of planning, expert context, and human-in-the-loop (your expertise gives you an important edge when working with deep agents). Planning will also be critical for long-horizon problems (think agents for scientific discovery, which comes next).
// Orchestrator & Sub-agent Architecture //
One big agent (typically with a very long context) is no longer enough. I've seen arguments against multi-agent systems and in favor of monolithic systems, but I am skeptical about this. The orchestrator-sug-agent architecture is one of the most powerful LLM-based agentic architectures you can leverage today for any domain you can imagine. An orchestrator manages specialized sub-agents such as search agents, coders, KB retrievers, analysts, verifiers, and writers, each with its own clean context and domain focus. The orchestrator delegates intelligently, and subagents execute efficiently. The orchestrator integrates their outputs into a coherent result. Claude Code popularized the use of this approach for coding and sug-agents, which, it turns out, are particularly useful for efficiently managing context (through separation of concerns).
I wrote a few notes on the power using orchestrator and subagents here https://t.co/t2KKTBoNTZ and here https://t.co/EmWIKId5rA
// Context Retrieval and Agentic Search //
Deep Agents don’t rely on conversation history alone. They store intermediate work in external memory like files, notes, vectors, or databases, letting them reference what matters without overloading the model’s context. High-quality structured memory is a thing of beauty. Take a look at recent works like ReasoningBank and Agentic Context Engineering for some really cool ideas on how to better optimize memory building and retrieval. Building with the orchestator-subagents architecture means that you can also leverage hybrid memory techniques (e.g., agentic search + semantic search), and you can let the agent decide what strategy to use.
// Context Engineering //
One of the worst things you can do when interacting with these types of agents is underpsecified instructions/prompts. Prompt engineering was and is important, but we will use the new term context engineering to emphasize the importance of building context for agents. The instructions need to be more explicit, detailed, and intentional to define when to plan, when to use a sub-agent, how to name files, and how to collaborate with humans. Part of context engineering also involves efforts around structured outputs, system prompt optimization, compacting context, evaluating context effectiveness, and optimizing tool definitions.
// Verification //
Next to context engineering, verification is one of the most important components of an agentic system (though less often discussed). Verification boils down to verifying outputs, which can be automated (LLM-as-a-Judge) or done by a human. Because of the effectiveness of modern LLMs at generating text (in domains like math and coding), it's easy to forget that they still suffer from hallucination, sycophancy, prompt injection, and a number of other issues. Verification helps with making your agents more reliable and more production-ready. You can build good verifiers by leveraging systematic evaluation pipelines. I can't believe people are advocating to cancel evals; evals are hard, but you can't dismiss their benefits.
This is a huge shift in how we build with AI agents. I've been teaching this stuff to agent builders over the past couple of months, if you are interested in more hands-on experience for how to build deep agents. https://t.co/e03Xqw7tKu
The figure you see in the post describes an agentic RAG system that students need to build for the final project.
Deep agents also feel like an important building block for what comes next: personalized proactive agents that can act on our behalf. I will write more on proactive agents in a future post.
LIVE NOW: AI Agents can discover and trust each other without a central intermediary. This lays the foundation for open agent economies.
ERC-8004 v1, co-authored with @DavideCrapis (@Ethereumfndn), @Jordan0Ellis (@Google) and - welcome Erik! - @programmer (@Coinbase) is now live.
It improves the August draft thanks to the inputs of hundreds of builders. Learn more about what this means for the future decentralized AI ↓
After 3 years of capital deployment towards AI infrastructure, we are now ready to start building the agentic web.
It will be multi-stack, multi-model, public, private, onchain, offchain, open and closed.
The agentic web is much bigger than you think.
Stop writing “RIP n8n, RIP Zapier, RIP every agent startup” every time OpenAI drops a release.
It’s lazy thinking disguised as insight.
I spent hours reading and watching everything from Dev Day and nothing they announced kills any startup in the ecosystem. It actually proves the ecosystem is alive.
AgentKit isn’t a “RIP” moment.
It’s a platform moment.
Here’s why:
AgentKit is for developers, not for non-technical teams or small businesses. It’s a framework, not a plug-and-play product.
Zapier, n8n, and similar tools serve entirely different users. They abstract complexity, handle integrations, and let non-devs build useful systems fast.
OpenAI just built infrastructure. Infrastructure doesn’t kill creativity, it powers it.
Every big leap like this creates new surface area for startups to build on top of it, not fewer.
Every “RIP” take assumes the market is static. But it never is.
The more OpenAI builds, the more space opens up for companies that know how to turn technical potential into human utility.
If you’re building right now, stop panicking.
This isn’t the end of the ecosystem.
Agents are the new apps, but the architecture is very different.
If you need a custom agent, AgentGraph is where you go to find Agentic Product Engineers who understand context engineering.
New on the Anthropic Engineering Blog: Most developers have heard of prompt engineering. But to get the most out of AI agents, you need context engineering.
We explain how it works: https://t.co/PpMTiT7AEG