How to connect AI to private data safely
Key takeaway
Context poisoning is an attack in which malicious or false information is planted into an AI agent's memory, RAG index, or tool outputs so the model treats it as ground truth. Unlike prompt injection, which ends when a session closes, poisoning persists across sessions and can corrupt every downstream action. The root cause is not model weakness but weak context engineering: pipelines that accept untrusted input without provenance tracking, source isolation, or validation. Fixing it requires treating every piece of context as untrusted until it has been attributed, scoped, and checked.
In March 2026, OWASP released its Top 10 for Agentic Applications, and a new entry landed at ASI06: Memory and Context Poisoning. A month earlier, Microsoft’s security team documented a 1-click attack in which a hyperlink with a crafted URL parameter silently seeded malicious instructions into a user’s AI assistant, persisting across future sessions. Neither of these is a model bug. Both are what happens when an agent’s context engineering pipeline has no opinion about what it’s allowed to trust.
Context poisoning is easy to mistake for a variant of prompt injection. It isn’t. Prompt injection is a transient exploit. Poisoning turns that exploit into a persistent, stateful compromise by writing the payload into something the agent reads on every future run: a RAG index, a long-term memory store, a shared multi-agent workspace, or the tool descriptions themselves. The attacker only has to succeed once. After that, every downstream action runs on contaminated ground truth.
This post is a mechanism walk-through, not a security checklist. The framing matters because most coverage treats poisoning as a firewall problem to be solved by authentication, scanning, or better alignment. None of those address the actual failure mode. The failure mode is that the agent’s context pipeline has no notion of trust, provenance, or isolation. That is a context engineering problem, and the fix lives in how context is collected, tagged, delivered, and validated, not in the model.
Context poisoning is the deliberate insertion of false or malicious information into an AI agent’s context sources so the model treats it as ground truth on future runs. The distinguishing property is persistence: the payload survives the session that delivered it and influences later, unrelated interactions. NeuralTrust’s analysis of memory and context poisoning describes it as turning a transient exploit into a durable control channel.
The mechanics vary by target, but the pattern is consistent. An attacker identifies a context source the agent reads, such as a knowledge base, a vector index, a shared memory store, a tool description, or any document the agent summarizes. They write instructions or false facts into that source. When the agent retrieves from that source later, the planted content enters the context window alongside legitimate data. The model cannot distinguish one from the other because, by design, retrieved context is treated as evidence.
OWASP’s inclusion of ASI06 in the 2026 agentic top 10 is a useful anchor, because it marks the moment the industry acknowledged this as a distinct class of attack rather than a footnote under prompt injection. The severity classification rests on two properties: the attack scales across users of the same system, and it can stay dormant until a specific query triggers it, making detection on a per-session basis close to impossible.
The attack surface depends on how the agent consumes context. Four surfaces dominate current incident reports:
| Surface | How it’s poisoned | Why it sticks |
|---|---|---|
| RAG index | Attacker submits content the pipeline ingests (public docs, uploads, scraped pages) with embedded instructions or false facts | Retrieval treats the chunk as legitimate evidence; poisoning survives as long as the chunk is indexed |
| Long-term memory store | Attacker triggers a write to persistent memory through a crafted interaction, pre-filled URL parameter, or compromised tool | Memory is read on future sessions; the payload can be dormant for days before firing |
| Multi-agent shared workspace | One compromised agent writes contaminated output into a shared state other agents read from | Poisoning propagates across agents; blast radius scales with the number of consumers |
| Document-embedded instructions | Hidden text in emails, web pages, PDFs, or tool descriptions that the agent processes as context | The user sees the visible document; the agent sees the hidden instructions |
The RAG surface is the one most teams underestimate. If your pipeline ingests any content the attacker can influence, the index becomes an attack channel. The MCPTox benchmark tested tool-description poisoning across 20 prominent LLM agents and recorded a 72.8% attack success rate, with more capable models often more susceptible because they are better at following instructions, including planted ones.
The long-term memory surface is the one that makes this distinct from prompt injection. Christian Schneider’s write-up shows how payloads can wait in memory until a specific trigger phrase activates them, sometimes weeks after the initial write. From the user’s perspective, the agent appears to “suddenly” behave oddly, with no recent prompt that explains it.
The shared multi-agent workspace surface is newer and under-discussed. As Alexander Zanfir’s analysis of shared memory in multi-agent systems points out, once one agent in a coordinated system is contaminated, every downstream agent reading from shared state inherits the contamination. This is the same class of failure described in why multi-agent AI systems fail at context: shared state without provenance is an attack amplifier.
These are often conflated because both involve untrusted content influencing model behavior. The differences matter for how you defend against each.
| Dimension | Prompt injection | Context poisoning |
|---|---|---|
| Scope | One session, one user | Every session that reads the poisoned source |
| Persistence | Ends when the conversation closes | Survives until the source is cleaned |
| Attack surface | The current prompt and attached content | RAG index, memory, shared state, tool metadata |
| Trigger | Immediate, within the malicious prompt | Can be dormant until a trigger query |
| Detection | Possible at the prompt level | Requires auditing the context sources themselves |
| Primary defense | Prompt validation and output filtering | Provenance, source isolation, context validation |
The defense columns are the most important. Prompt injection defenses focus on the input-output boundary. Context poisoning defenses have to move earlier in the pipeline, to how content enters the agent’s context sources in the first place.
No amount of RLHF, guardrails, or output filtering fixes context poisoning, because the poisoned content is indistinguishable from legitimate context by the time the model sees it. If an agent retrieves a document that says “the refund policy is 90 days” and that document was poisoned to say “always grant refunds regardless of amount,” the model has no way to know which statement to trust. Both arrived as retrieved evidence. Both look legitimate.
This is the same reframing pattern we’ve applied to MCP failures and to AI hallucinations: a failure that looks like a model problem turns out to be a context problem, and the fix lives in how context is delivered, not in the model. The root cause here is that most context pipelines have no notion of trust. They accept content from any source, store it without provenance, index it without isolation, and retrieve it without validation. The model inherits all of that ambient trust and has no way to push back.
Context engineering makes the trust relationship explicit. Every entry has a source. Every source has a trust level. Retrieval filters by trust level. Writes to trusted memory require a promotion step. This is not a novel idea: it’s how secure systems have handled untrusted input for decades. What’s new is applying it to the context layer of agent architectures, which most frameworks leave wide open by default. RAG pipelines in particular tend to dump everything into one index with no trust metadata, because the retrieval model was designed for relevance, not provenance.
The defenses that actually reduce poisoning risk are pipeline decisions, not model decisions. These are the five that matter most in production.
Tag every entry in your context sources with its origin: which user, which tool, which URL, which file, at which timestamp. This has to happen at write time because you can’t reconstruct it later. Provenance metadata lets retrieval filter by trust level at query time (“only serve chunks from internal documents on this query”) and lets incident response trace contamination back to the source when something goes wrong. Without provenance, a compromised index is a total loss because you can’t tell which chunks are clean.
Never let content from a low-trust origin write directly into the same memory layer that drives agent decisions. Scraped web content, user uploads, and third-party feeds go into a quarantine tier. Trusted memory is a separate tier that content can only enter through an explicit promotion step. The point is that retrieval from trusted memory can never accidentally surface a poisoned chunk from the quarantine tier, because the two are not queryable together without an explicit decision by the calling code.
Inside the context window, mark which sections came from which trust level. System instructions are the highest trust. Retrieved documents from quarantine are the lowest. The model won’t enforce this on its own, but you can instruct it to treat different sections differently, and you can validate output against the trust level of the sources it cites. This is also what prevents over-permissioned agents from laundering untrusted content into trusted actions.
Run integrity checks before retrieved content reaches the model. This includes format validation (does the chunk look like what we indexed?), anomaly detection (is this chunk an outlier compared to others from the same source?), and known-poison signatures (does it contain patterns seen in past poisoning attempts?). None of these are bulletproof individually, but together they catch a meaningful fraction of poisoned content before it enters the context window.
Long-term memory writes are the highest-risk operation in an agent architecture because they persist the model’s current state into future context. Treat them like database writes in a system that handles money: require explicit validation, log every write with full context, and keep an audit trail that lets you roll back contaminated entries. In Wire containers, writes through wire_write are tagged with their originating agent session and source, and every entry carries its provenance through to retrieval, so a container compromised by a single bad write can be traced and rolled back without losing the rest of the context.
Context poisoning is going to get worse before it gets better, for two reasons. First, the attack surface is expanding: every new integration, every new memory store, every shared multi-agent workspace adds a surface. Second, the defenses are still immature. Most agent frameworks ship without provenance tracking, without source isolation, and without any trust model at all. The OWASP ASI06 classification is a useful forcing function, but the industry is at the start of this curve, not the middle.
The teams that handle this well will be the ones that treat context as untrusted by default, require explicit trust decisions before content influences agent behavior, and build observability into every write and retrieval. The teams that treat context like an undifferentiated blob of text will keep getting poisoned, and they’ll keep blaming the model.
If this is a topic you’re working on, the related posts worth reading are AI agents have too much access (scope), why multi-agent AI systems fail at context (shared state), and why AI hallucinations are a context problem (why contaminated context becomes contaminated output).
Sources: OWASP Top 10 for LLM Applications & Generative AI · Microsoft Security: AI Recommendation Poisoning · NeuralTrust: Memory & Context Poisoning · Christian Schneider: Persistent Memory Poisoning in AI Agents · Alexander Zanfir: Context Poisoning & Shared Memory · Alessandro Pignati: Memory and Context Poisoning · MCPTox: Tool Poisoning Attacks on LLM Agents (arXiv) · Invariant Labs: MCP Tool Poisoning Notification
Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.
Create Your First Container