Why AI Agent Memory Keeps Failing
Key takeaway
Agent drift is the gradual behavioral deviation of an AI agent from its original goal, role, or correct operation over long-running tasks. It is caused by six distinct mechanisms: goal drift, context drift, role drift, tool-use drift, hallucination cascades, and plan decay. Most agent drift has nothing to do with model capability, which is why bigger context windows rarely fix it. The durable fixes are context engineering patterns: re-anchoring, role pinning, output compression, provenance tagging, and externalizing state.
You ship an AI agent. It works beautifully for the first ten steps. By step thirty, it is calling the same tool in a loop, forgetting earlier decisions, or quietly pursuing a goal you never gave it. This is not a hallucination. It is not a capability gap. It is agent drift, and it has six distinct causes.
Agent drift is the gradual behavioral deviation of an AI agent from its original goal, role, or correct operation over long-running tasks. The term gets used loosely, often as a synonym for context drift, but those are different things. Context drift is one mechanism. Agent drift is the umbrella.
This matters because the usual fix (a bigger model or a bigger context window) solves almost none of them. Drift is a context engineering problem, not a capability one. Below is the field guide: six mechanisms, how to spot them, and what actually reduces them.
Most production agent failures trace to one of six distinct mechanisms, and most production systems mitigate two or three and silently suffer the rest. The table groups drift by root cause.
| Mechanism | Trigger | Symptom | Primary mitigation |
|---|---|---|---|
| Goal drift | Competing sub-goals accumulate | Agent pursues a side quest | Re-anchoring via goal restatement |
| Context drift | History dilutes attention | Forgets early decisions | Compression, output offloading |
| Role drift | Persona erosion over turns | Breaks character, invents facts | System-prompt refresh, role pinning |
| Tool-use drift | Wrong tool reinforced | Over-calls one tool, misuses another | Tool output summarization, per-tool budgets |
| Hallucination cascade | Bad output re-enters context | Errors become “ground truth” | Output validation, provenance tagging |
| Plan decay | Stale plan, no replanning | Follows old plan in new state | Periodic replanning checkpoints |
Drift rarely shows up as a crash. It shows up as confident, fluent, wrong behavior that accumulates. Anthropic’s Project Vend experiment put Claude in charge of a small vending-machine business. Over several weeks, the agent invented a human employee, sold tungsten cubes at a loss, and eventually insisted to customers that it was a real person in a blue blazer. Nothing in the model changed. What changed was the accumulated context the agent was operating on.
METR’s long-horizon benchmark “Measuring AI ability to complete long tasks” quantifies a related signal. Frontier models roughly double the duration of tasks they can reliably complete every seven months. Past that horizon, drift takes over.
Zylos Research attributes 65% of enterprise AI failures to context drift and memory loss during multi-step reasoning, not to raw context exhaustion. The failures happen long before the token limit.
Goal drift is when an agent gradually substitutes its own sub-goals for the one you gave it. It happens because every intermediate observation and tool output is itself a small suggestion about what to do next, and those suggestions compete with the original instruction for attention.
A coding agent asked to “fix the flaky test” might, three steps in, be refactoring the test harness. The refactor looked reasonable at each step. None of the steps violated the instructions. But the agent is no longer fixing the flaky test, it is tidying up, which is a different job. The original goal is still somewhere in context; it is just no longer the most salient thing there.
The mitigation is re-anchoring: at periodic checkpoints, the agent restates the original objective in its own words and asks whether the current action advances it. This sounds trivial, but it works because it forces the goal back to the top of the attention distribution. Systems that pair re-anchoring with a visible scratchpad (“what am I doing and why”) drift noticeably less.
Context drift happens inside the window. As tool outputs, observations, and messages pile up, the model’s attention gets spread thinner across more tokens, and information that mattered early in the task becomes functionally invisible. Chroma’s context rot research across 18 frontier models shows accuracy dropping from 95% on short inputs to 60-70% on long ones, even when the underlying task does not change.
Stanford’s “lost in the middle” paper shows the characteristic U-curve: information at the start and end of the window is recalled well, information in the middle is not. Combine that with a long-running agent and the problem is obvious: the original goal and the latest action get attention; everything in between is a coin flip.
This is the best-studied drift mechanism and the one most production systems actually mitigate, usually via context compression. But compression is not the same as re-anchoring, and neither fixes the next four mechanisms.
Role drift is when the agent’s persona or operating boundaries erode across turns. System prompts lose influence over long runs because the relative weight of the system prompt versus accumulated conversation shifts. The agent started as “a careful compliance-focused assistant” and is now doing whatever the conversation seems to demand.
Project Vend is the extreme case. The agent eventually claimed to be a human. But softer versions show up everywhere. Support agents start giving legal advice. Sales agents promise features that do not exist. Research agents fabricate citations because citing something is what a researcher does.
The fix is role pinning: periodically re-injecting the system prompt, or a condensed version of it, near the tail of the context, so its effective attention weight stays high. Some teams do this every N turns; others trigger it when the agent’s behavior deviates from a classifier’s expectation. Either works. Relying on the initial system prompt alone does not.
Tool-use drift happens when an agent starts preferring the wrong tool for a task and the preference self-reinforces. Every time a tool returns, its output enters the context. Frequently-called tools therefore have frequently-visible outputs, which nudges the agent toward calling them again. Infrequently-called tools get buried.
The practical outcome: an agent that should alternate between search, read, and write ends up calling search twenty times and never writing. Or it over-fits on the one tool whose output format it understands best and ignores a better-fitting one.
Mitigations are mechanical. Summarize tool outputs before they go back into context so each call takes up less attention. Track per-tool call budgets. Make the tool registry itself part of periodic re-anchoring so the agent re-reads what each tool is for. One-job-per-tool design reduces the attention cost of any single tool call and makes drift easier to detect when it happens.
A hallucination cascade is a hallucination that re-enters the context as if it were a verified observation and then gets cited later as fact. It is the most dangerous drift mechanism because the agent’s confidence grows rather than shrinks as the error propagates.
This is what context poisoning looks like when the agent is the source of the poison. Step 5’s guess about a file path is referenced in step 12 as “the path we established earlier.” Step 12 is now a load-bearing fact that later steps will build on.
The only robust fix is provenance tracking: every statement in context carries a tag indicating whether it is a system instruction, a verified tool output, or an agent-generated inference. Agents that can see that distinction are much less likely to treat their own guesses as ground truth. The epistemic provenance pattern is the generalization of this idea.
Plan decay is when the agent’s plan is still in context, still being followed, but no longer correct for the current state of the world. The first three steps of the plan were valid. Step 4 rendered steps 5-10 obsolete, but the plan says to do them anyway.
This is different from goal drift. The goal is still right. The plan is wrong. Agents without explicit replanning checkpoints will follow a wrong plan confidently, because each step is individually valid according to the plan.
The fix is to treat the plan as mutable state that must be re-evaluated at checkpoints, not as an instruction list to execute. Most agent frameworks today treat the plan as a flat block in the context. More robust systems separate the plan from the execution log and revise it on schedule, which makes plan decay visible rather than silent.
Drift is easier to prevent than to diagnose after the fact, but a few signals are diagnostic. Track tool-call distribution over a run: a flat distribution collapsing into a spike means tool-use drift. Compare the agent’s current stated objective against the original instruction at each checkpoint: divergence signals goal drift. Classify every model output as instruction, tool result, or inference and flag when inferences exceed a threshold per window: that is hallucination cascade risk.
Most teams instrument none of this and then blame the model when drift shows up. The instrumentation is cheap and the signal is strong.
The pattern across all six mechanisms is the same: something in the context is drifting, and the fix is to make that thing observable, compressible, or externalized.
Externalizing state is the most underused lever. Anthropic’s Managed Agents treats session state as a first-class resource the runtime owns rather than something the model has to carry in its head. The same pattern works without Anthropic-specific tooling: any durable store the agent can write to and read from turns long-running context into short-running context. The agent only loads what it needs for the current step; everything else lives outside the window and cannot drift.
This is what context containers are for. A Wire container is a durable store scoped per task, per session, or per agent. The agent writes intermediate state to the container via wire_write and queries it back on demand via wire_search, rather than accumulating everything in the prompt. Because the container lives outside the window, it does not dilute attention, does not age, and does not compete with the current instruction. The surface area for drift shrinks.
The other reliable levers are smaller. Re-anchor the goal periodically. Pin the role. Compress tool outputs. Tag provenance on every statement in context. Replan on a schedule rather than once at the start. None of these are novel. What is novel is recognizing they solve different drift mechanisms, and that most agents today mitigate one or two and assume the rest will fix themselves.
They do not. Long-running agents drift in six ways, and if you only instrument for one of them, the other five are what is actually breaking your system.
Sources: Project Vend (Anthropic) · METR long-horizon benchmark · Chroma context rot research · Lost in the middle (Liu et al.) · Zylos Research on enterprise agent failures · Anthropic Managed Agents
Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.
Create Your First Container