Multi-Agent System Context Engineering Agent Drift Context Rot

Why every agent handoff corrupts your context

Jitpal Kocher · May 1, 2026 · 10 min read

Key takeaway

Every multi-agent handoff is a lossy compression event: information that exists in Agent A's context is partially lost, distorted, or silently fabricated when it reaches Agent B. Five types of context degrade predictably at handoff boundaries: causal reasoning, implicit constraints, uncertainty signals, temporal ordering, and negative space. Degradation compounds across hops, which is why the first agent in a pipeline is reliable and the fifth is not. The fix is not bigger context windows but structured handoff contracts that explicitly preserve the signal each downstream agent needs.

You run a multi-agent pipeline. Agent 1 does research and hands off to Agent 2, which synthesizes and passes to Agent 3, which writes. Agent 1 is reliable. Agent 3 is not. You swap in a better model for Agent 3. It is still not reliable.

The problem is not Agent 3. It is what Agent 3 received.

Every handoff in a multi-agent system is a compression event. Agent A summarizes its work and passes it forward. Agent B receives that summary and treats it as ground truth. But Agent A’s summary was optimized for what Agent A thought mattered, not for what Agent B actually needs. Signal that existed in Agent A’s full context gets compressed away at the boundary, and Agent B has no way to know what it lost.

This is distinct from the structural failures that cause 86.7% of multi-agent runs to fail (UC Berkeley, 1,600+ traces), and distinct from agent drift within a single session. Those are context management problems. Handoff degradation is a compression and translation problem. It happens even when every individual agent is working correctly.

A handoff is a lossy compression event

Context fidelity decreases at every agent handoff, not because agents are careless, but because compression is inherently lossy and the compressor has the wrong objective function. When Agent A writes a summary for Agent B, it answers an implicit question: “What do I want the next step to know?” That is not the same question as “What does the next agent need to complete its task correctly?”

This is the telephone game problem with technical structure. Each agent is not being imprecise; it is being precise about the wrong things. The receiving agent cannot ask for clarification because it does not know what was there to be lost.

Anthropic’s multi-agent research system addresses this directly by requiring a fixed output schema: sub-agents return a structured object containing a summary, key findings, a confidence score, and an explicit list of what should be verified. Without that schema, compression defaults to narrative prose, which reliably drops exactly the five types of context described below.

Five types of context that degrade at handoffs

Handoff degradation is not random. The same categories of information compress away predictably, and knowing them lets you engineer against each one specifically.

Context type	What gets lost	Why it degrades	Downstream effect
Causal chains	Why decisions were made	Summaries report outcomes, not reasoning	Agent B reverses decisions without knowing why they were made
Implicit constraints	Conditions that go unstated	Writer assumes reader shares the same context	Agent B violates constraints it never knew existed
Uncertainty signals	Confidence levels on findings	Prose flattens hedged claims into assertions	Agent B treats a guess as a verified fact
Temporal ordering	Sequence of attempts and failures	Summaries report what happened, not when or why	Agent B repeats work in the wrong order
Negative space	What was tried and rejected	Not reporting failures is the default	Agent B tries the same approaches that already failed

Causal chains

Agent A decides not to query a particular data source because the results were unreliable. The decision enters the summary as “data source X was not used.” Agent B, seeing that X exists and is relevant, queries it. The query returns unreliable data. Agent B uses it.

The reasoning disappears at the boundary. Summaries report outcomes, not the chain that produced them. This is the most common root cause of downstream agents reversing work: not because they are careless, but because the reasoning that produced the earlier decision was never handed over.

Implicit constraints

Every agent builds up contextual constraints as it works: scope limits, discovered edge cases, stakeholder preferences, assumptions that have been validated or invalidated. These are often never written down explicitly because, within Agent A’s session, they are obvious from context.

“We are not changing the pricing structure” makes sense in Agent A’s context because it saw the message from the user three steps back. It may not make the summary because Agent A did not think it needed to be said. Agent B has no access to that message, so it changes the pricing structure.

Uncertainty signals

Uncertainty is compressed away faster than any other context type. “I think this is probably X, but you should verify” becomes “X” in a summary. The hedge is invisible. By the time the assertion reaches Agent 3, it has been cited twice as fact.

This is context poisoning generated internally rather than injected from outside. The agent is the source of its own corrupted ground truth. The epistemic provenance pattern exists specifically to address this: every statement in context carries a tag indicating whether it is a verified tool output or an agent inference.

Temporal ordering

The sequence matters. Agent A tried approach X, it failed, then tried Y, which partially worked, then modified Y to produce Z. The summary says “approach Z was used.” Agent B, building on Z without understanding that X was tried and rejected, applies X as a refinement. X breaks Z.

Dead ends and iteration paths are load-bearing information that almost never survives a handoff in standard unstructured summaries.

Negative space

Negative space is the most expensive loss. Agent A tried six approaches before finding one that worked. Those six failures are not in the summary. Agent B, facing a similar sub-problem, tries the same six. The multi-agent pipeline does not just fail to benefit from prior work; it actively repeats it.

JetBrains and NeurIPS 2025 research found that context management techniques reduce costs by roughly 50% without significant accuracy loss. Much of that gain comes from not re-generating work that was already done. Without explicit negative space logging at handoffs, that gain is unavailable to any downstream agent.

How degradation compounds across hops

A single handoff loses some signal. Each subsequent handoff loses signal from the already-degraded version. By the fourth or fifth agent in a pipeline, the context has been compressed three or four times, each time optimized for a different agent’s perspective, and has accumulated distorted causation, flattened uncertainties, and logged none of the dead ends.

This is why the first agent in a pipeline is reliable and the fifth is not, even when the pipeline is working correctly at each individual step. The failure is at the interface, not inside any agent.

Context rot describes how accuracy degrades as a single context window fills with noise. Handoff degradation is the inter-agent version: each hop adds noise and removes signal. By the time context reaches Agent 5, it is longer than it needs to be and less informative than it should be.

The instinct is to add more agents to check each other’s work. This makes the problem worse. More hops mean more compression events, and each one degrades the context that subsequent checking agents rely on.

What leading frameworks do at handoff boundaries

The teams building production multi-agent systems have converged on structuring handoffs explicitly rather than relying on agents to summarize naturally.

Anthropic: prescribed output schema

Anthropic’s multi-agent research system gives sub-agents a fixed output format: summary, key findings, confidence score, and a “verify before using” list. Sub-agents use 10,000 to 50,000 tokens internally but return 1,000 to 2,000 token structured outputs. The system outperformed single-agent Claude Opus 4 by 90.2% on internal research evaluations. The schema is not a style preference; it is what forces the five degradation categories to survive the compression.

Google ADK: compiled context views

Google’s Agent Development Kit treats the handoff as a re-casting event rather than a copy. When Agent A hands off to Agent B, context is not passed wholesale; it is re-compiled into a scoped view of what B specifically needs. B reaches for additional information explicitly via tools rather than inheriting A’s full history. Their design principle is direct: share memory by communicating, not by sharing the memory store itself.

LangChain: filesystem offloading

LangChain’s Deep Agents framework addresses the cost side of the same problem. Sub-agents do exploratory work independently and return only their final output to the parent. The parent never sees the 20 tool calls that produced the result, preventing context explosion while preserving the useful signal from sub-agent work.

Engineering patterns for handoff fidelity

The common thread across all three frameworks is that handoffs are first-class engineering concerns, not default behavior. Applied to the five degradation categories, this produces a set of practical patterns.

Use structured handoff schemas. Define a required output format for each agent-to-agent interface. At minimum: what was done, why (the reasoning, not just the decision), what was tried and did not work, what assumptions were made, and what confidence level applies to each finding. Unstructured prose summaries default to compressing away everything in this list.

Thread provenance. Every fact that travels across a handoff boundary should carry a tag: verified tool output, agent inference, or inherited claim. Agents that can see this distinction are much less likely to treat a guess as a fact by the time it has passed through two or three handoffs. This is the epistemic provenance pattern applied at the system level.

Log negative space explicitly. Require agents to include a “rejected approaches” field in every handoff output. This is the highest-value, lowest-cost structural change you can make to a multi-agent pipeline. It eliminates the repeated dead-end problem and surfaces information about the solution space that would otherwise be gone permanently.

Write for the receiver, not the reporter. When writing a handoff summary, the question to answer is “What does the next agent need to complete its specific task?” not “What did I do?” These produce different summaries. Agents given explicit instructions to write for their downstream consumer rather than reporting their own activity produce substantially more useful handoffs.

Use external context stores for longer pipelines. For pipelines with more than three or four agents, cumulative handoff compression becomes severe enough that structured summaries alone are insufficient. Externalizing intermediate state to a queryable store lets downstream agents retrieve the specific context they need rather than receiving a pre-selected summary. This is what context containers are designed for: scoped, queryable context that agents access via MCP tools rather than receiving wholesale in the prompt. Each agent queries exactly what it needs, and nothing it does not, which keeps handoff surfaces small and degradation bounded.

The boundary is the system

The instinct when a multi-agent pipeline fails is to improve the failing agent. Add a better model, refine its prompt, adjust its temperature. Sometimes this works. More often, the agent is fine and the context it received is not.

Treating the handoff boundary as a first-class engineering concern changes the diagnosis. When Agent 5 fails, the question is not “what is wrong with Agent 5” but “what arrived at Agent 5 and how far it has drifted from what Agent 1 knew.” The answer is almost always further than expected.

Multi-agent systems fail at context because most are designed as if context is automatically preserved at every handoff. It is not. Every boundary is a compression event, and every compression event is an opportunity for the five degradation categories to strip signal from the pipeline. Engineering against that is not an optimization. It is the system.

Sources: UC Berkeley MAST (arXiv:2503.13657) · Anthropic multi-agent research system · Google ADK context documentation · LangChain Deep Agents · JetBrains/NeurIPS 2025 context management

Frequently asked questions

Why do multi-agent pipelines become less reliable after several steps?

Each handoff is a compression event where the summarizing agent optimizes for what it thinks matters, not what the next agent needs. Causal reasoning, implicit constraints, and uncertainty signals degrade at every boundary. By step four or five, downstream agents are working from a version of the original context that has lost most of its signal, even when individual agents perform correctly.

What information gets lost when an AI agent summarizes for the next agent?

The five categories that degrade most predictably are causal chains (why decisions were made), implicit constraints (conditions that go without saying in context but not on paper), uncertainty signals (how confident the agent was), temporal ordering (the sequence of attempts and failures), and negative space (what was tried and rejected). Negative space is the most underappreciated: without it, downstream agents repeat work that already failed.

How do you prevent context loss at agent handoff boundaries?

Use structured handoff contracts that require explicit fields for reasoning, confidence levels, rejected approaches, and constraints. Anthropic's multi-agent research system uses a fixed output schema where sub-agents return a summary, key findings, a confidence score, and a list of what needs verification. This beats unstructured summaries because the schema forces the agent to surface what would otherwise be compressed away.

What is the difference between context loss at handoffs and agent drift?

Agent drift happens within a single agent's session as accumulated history dilutes attention over time. Context loss at handoffs happens at the boundary between agents during a transfer. They often look identical in production because the symptom is the same (unreliable downstream behavior), but the mechanisms differ. Agent drift is an attention problem; handoff loss is a compression and translation problem.

Should agents pass full context or summaries to downstream agents?

Summaries, with structured schema. Anthropic's multi-agent research system shows sub-agents using 10,000 to 50,000 tokens internally but returning only 1,000 to 2,000 token structured summaries, and the system still outperformed single-agent setups by 90.2% on research evaluations. Full context passing creates context explosion. The key is that summaries need explicit fields for the information categories most likely to be compressed away.

Agent Drift AI Agent

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container