Twenty years ago, developers learned the hard way that letting user input become part of a database query was dangerous. Now weโre repeating the same mistake with AI.
Agents read untrusted text and treat it as instructions. A webpage, a PDF, a GitHub comment, a support ticketโฆ it all lands in the same context as the system prompt. The model cannot reliably distinguish between data and instructions. So attackers just write instructions inside the data.
With ZeroLeaks Iโve been testing agents that browse the web and call tools. Even modern models still follow injected instructions surprisingly often.
The scary part isnโt the jailbreak.
Itโs that agents have permissions: they can call APIs, run workflows, send messages, access dataโฆ.
Prompt injection turns text into actions. Twenty years ago we learned to separate user input from SQL queries. AI agents need the same idea: separate untrusted text from instructions.
Until that happens, prompt injection will remain one of the biggest risks in agent systems.