Indirect Prompt Injection — How Attacks Hide in Documents Your AI Reads
Indirect Prompt Injection — How Attacks Hide in Documents Your AI Reads
TLDR: The attacker never touched the chat interface. They uploaded a resume. To an AI-powered HR tool. With instructions hidden in white text at the bottom. The AI read the document, processed the hidden instruction, and output a glowing recommendation for an unqualified candidate. No exploit. No code. Just text. This is indirect prompt injection — and in June 2025, the exact same principle let a researcher send a single email and silently extract sensitive data from Microsoft 365 Copilot without the victim clicking anything. CVSS score: 9.3. The vulnerability that made it possible isn't a bug in the traditional sense. It's a property of how AI systems work.
Direct vs. Indirect — The Distinction That Matters
In the previous post on prompt injection, we covered the fundamentals: an LLM cannot reliably distinguish between instructions from its developer and instructions embedded in user input. Direct prompt injection is the simpler case — a user types manipulative instructions into the chat interface and tries to override the AI's behaviour.
Indirect prompt injection is categorically more dangerous, and for one specific reason: the attacker never interacts with the AI at all.
Instead, the attacker plants malicious instructions inside content that the AI will later process on behalf of a legitimate user — a document, an email, a webpage, a database record. The AI encounters this content during a routine task, reads the hidden instructions, and follows them. The legitimate user sees nothing unusual. The attacker, who may have had no access to the system, has just used the AI as an unwitting agent inside the organisation.
This is the difference between picking a lock and convincing someone inside the building to open the door for you.
How It Works Across Each Attack Context
Document upload. Your AI-powered tool processes a user-submitted PDF — a resume, a contract, an invoice. The attacker has embedded invisible instructions in white text, metadata, or a comment field. The AI reads the full document, including the hidden instructions, and follows them — outputting a false recommendation, leaking other data from context, or triggering a downstream action.
RAG retrieval. Your AI uses Retrieval-Augmented Generation to pull from an internal knowledge base before responding. An attacker who can write to any document, repository, or shared folder that the AI indexes can plant instructions there. When any user queries the AI on a related topic, the poisoned document gets retrieved and injected into the model's context. The malicious instructions arrive silently, embedded inside what appears to be legitimate knowledge base content.
Email processing. An AI assistant that reads and summarises emails — or helps draft replies — is exposed to every email that reaches the inbox, including those from external, untrusted senders. An attacker can send an email whose contents are crafted to manipulate the AI when it processes them. The recipient doesn't even need to read it.
Web browsing agents. AI agents that can fetch URLs and summarise web content are exposed to every page they visit. A malicious webpage can contain hidden instructions designed specifically to manipulate AI agents that read them — a technique researchers have demonstrated repeatedly in controlled settings.
EchoLeak — The 2025 Case That Proved This at Scale
In June 2025, researchers at Aim Security disclosed CVE-2025-32711, named EchoLeak — a critical vulnerability in Microsoft 365 Copilot that allowed an attacker to steal sensitive organisational data without any user interaction, simply by sending a carefully crafted email to a user within the organisation.
The attack embeds a malicious prompt payload inside markdown-formatted content, like an email, which is then parsed by the AI system's Retrieval-Augmented Generation (RAG) engine. The payload silently triggers the LLM to extract and return private information from the user's current context — leaking data via Microsoft Teams and SharePoint URLs. No user clicks are required.
What made EchoLeak particularly striking was how many defences it bypassed in sequence. The attacker defeated Microsoft's XPIA (Cross-Prompt Injection Attempt) classifier by crafting the email's language to appear harmless and directed at the human recipient rather than an AI. The malicious prompt never explicitly mentioned AI, Copilot, or anything that would raise suspicion. From there, it circumvented link redaction using reference-style Markdown, exploited auto-fetched images, and abused a Microsoft Teams proxy permitted by the content security policy — achieving full privilege escalation across LLM trust boundaries.
The vulnerability was rated CVSS 9.3 and was added to Microsoft's Patch Tuesday list for June 2025. Aim Labs noted that while the specific exploit targets Microsoft Copilot, the underlying design flaws of RAG-based AI may mean more applications are similarly vulnerable.
This was not a theoretical demonstration. This was a production enterprise AI product, used by tens of thousands of businesses, silently leaking internal documents in response to a single inbound email.
The Agent Problem — When Injection Leads to Action
The severity of indirect prompt injection scales directly with what the AI is permitted to do.
If your AI can only generate text, a successful injection produces a misleading output. Embarrassing, potentially damaging, but contained.
If your AI agent can send emails, query databases, modify files, call APIs, or execute code — a successful injection can do all of those things on the attacker's behalf, with the full permissions of the legitimate user who triggered the task. The AI doesn't distinguish between instructions from its developer and instructions embedded in a poisoned document. It simply acts.
This is why the OWASP Top 10 for LLM Applications 2025 treats Excessive Agency as a standalone risk category. The more autonomy an AI agent has, the more catastrophic an indirect injection becomes. An AI agent with read-only access to a knowledge base is a very different risk profile from an AI agent that can send emails, approve workflows, or run terminal commands.
The principle of least privilege — which has been a foundational security concept in traditional application design for decades — applies here with even greater force. An AI agent should be granted only the permissions it genuinely needs for its stated purpose. Every additional capability is additional blast radius if an injection succeeds.
Defence: What Actually Reduces the Risk
There is no single fix that eliminates indirect prompt injection. But several layers in combination substantially reduce the attack surface.
Content sandboxing and trust boundaries. Content from external, untrusted sources — inbound emails, uploaded documents, fetched URLs — should be processed in a context that is architecturally separated from the AI's instruction layer. The model should not be able to treat content from untrusted sources as instructions. This is easier to design into a new system than to retrofit into an existing one, which is why it belongs in the architecture conversation, not the post-deployment security review.
Principle of least privilege for AI agents. If the AI doesn't need to send emails, don't give it that permission. If it doesn't need write access to SharePoint, don't grant it. The more restricted an agent's permissions, the less an injected instruction can actually accomplish. This is the most reliable mitigation currently available, because it reduces impact even when prevention fails.
Output validation and monitoring. Examine what the AI generates before it is acted upon. Flag outputs that contain URLs pointing to external domains, outputs that reference data sources not mentioned in the user's query, or outputs that include instructions directed at the user. These patterns can indicate an injection is in progress. Logging AI activity — what the model accessed, what it returned, what actions it took — is essential for both detection and post-incident investigation.
Human-in-the-loop for high-impact actions. Any action with significant real-world consequences — sending an external email, approving a financial transaction, modifying production data — should require explicit human confirmation, with the full context of what the AI is about to do displayed clearly. An AI agent that can take consequential actions silently is an AI agent that can be weaponised silently.
Adversarial testing of every data source the AI reads. This is the piece that almost no organisation is doing. Most AI security testing, when it happens at all, focuses on the user-facing chat interface. Indirect injection tests the opposite direction — every document store, email inbox, database, URL, and API response that the AI processes is a potential attack surface.
What Testing for Indirect Injection Actually Involves
Testing an AI application for indirect prompt injection requires systematically treating every data source the model reads as a potential attack vector, not just the user interface.
A structured assessment covers: injection attempts embedded in every supported document format (PDF, Word, spreadsheets, plain text); poisoned entries planted in the RAG knowledge base to test retrieval-triggered injection; crafted external content (emails, URLs, API responses) designed to manipulate agent behaviour; invisible character and encoding obfuscation to test whether sanitisation can be bypassed; and privilege boundary tests to determine what an injected instruction can actually be made to do with the agent's current permissions.
The goal is to find the boundaries of what's possible in your specific system — before an attacker maps them first.
How Kuboid Secure Layer Can Help
At Kuboid Secure Layer, indirect prompt injection testing is a specific component of our AI security assessment service — not a checkbox at the end of a traditional pentest, but a dedicated methodology applied to every surface your AI reads.
If your application processes documents, emails, web content, or any external data through an LLM, that content is an attack surface. We can help you understand exactly what's exposed and what an attacker could do with it. Get in touch here — or read more about how we approach security assessments.
Final Thought
EchoLeak didn't require the attacker to compromise a single account, bypass a single firewall, or write a single line of malicious code. It required sending one email. The AI did the rest — accessing internal files, packaging sensitive data, and routing it externally, all while appearing to help the user with a routine task.
That's the nature of indirect prompt injection. It turns the AI's most useful quality — its ability to process and act on information from many sources — into a liability. The organisations that understand this early, and design their AI systems with explicit trust boundaries and minimal permissions from the start, are the ones that won't be explaining an EchoLeak-style incident to their board in 2026.
Kuboid Secure Layer provides AI security assessments, prompt injection testing, and application penetration testing. Learn more at www.kuboid.in or explore our services.