Twenty years ago, developers learned the hard way that letting user input become part of a database query was dangerous. Now weβre repeating the same mistake with AI.
Agents read untrusted text and treat it as instructions. A webpage, a PDF, a GitHub comment, a support ticket⦠it all lands in the same context as the system prompt. The model cannot reliably distinguish between data and instructions. So attackers just write instructions inside the data.
With ZeroLeaks Iβve been testing agents that browse the web and call tools. Even modern models still follow injected instructions surprisingly often.
The scary part isnβt the jailbreak.
Itβs that agents have permissions: they can call APIs, run workflows, send messages, access dataβ¦.
Prompt injection turns text into actions. Twenty years ago we learned to separate user input from SQL queries. AI agents need the same idea: separate untrusted text from instructions.
Until that happens, prompt injection will remain one of the biggest risks in agent systems.