Multi-Agent System Context Rot Context Window Context Engineering

Sub-agent context isolation: the fix for context rot

Jitpal Kocher · June 5, 2026 · 8 min read

Key takeaway

Sub-agent context isolation gives each agent its own context window scoped to a single task, so the orchestrator's main thread never fills with the intermediate tool calls and scratch work that cause context rot. Anthropic's multi-agent research system used this to beat single-agent Claude Opus 4 by 90.2%, and AOrchestra reports a 16.28% relative gain from spawning context-isolated executors on demand. The catch is that isolation can strand shared context: Cognition argues sub-agents acting without each other's traces make conflicting decisions, so the boundary has to be designed, not assumed.

Anthropic’s multi-agent research system beat single-agent Claude Opus 4 by 90.2% on their internal research evaluations. The architectural choice doing most of that work is not smarter agents. It is that each sub-agent runs in its own context window, explores in tens of thousands of tokens, and returns only a 1,000 to 2,000 token summary to the parent. The parent never sees the scratch work.

That pattern has a name: sub-agent context isolation. It is the deliberate decision to give each unit of work a fresh, scoped context instead of letting one thread accumulate everything. Done well, it is the most direct fix we have for context rot. Done naively, it trades one failure mode for another. This post covers what isolation actually buys you, the two ways to implement it, and the specific case where it backfires.

What sub-agent context isolation is

Sub-agent context isolation means each agent gets a context window scoped to a single task, separate from the orchestrator’s main thread and from its siblings. Anthropic’s context engineering guide lists “isolate” as one of four core operations, alongside write, select, and compress: give each sub-agent its own window scoped to its specific task. The orchestrator delegates, the sub-agent works in private, and only a compressed result crosses back.

The point is not parallelism for speed. The point is that a context window stays clean when it only ever holds what one task needs. A research sub-agent can read twenty documents and run a dozen tool calls without any of that landing in the orchestrator’s window or in a sibling agent’s reasoning. Isolation turns the handoff into a compression boundary rather than a place to dump state, which is the same conclusion the teams behind production multi-agent systems keep arriving at independently. OpenAI has since baked the pattern into the model itself, with GPT-5.6’s Ultra mode giving each subagent its own window directly in the weights.

Why one shared thread rots

A single growing thread degrades because attention dilutes as the window fills, and most of what fills it is noise the model no longer needs. Every tool call, every intermediate observation, every abandoned branch stays in context and gets rebilled on the next step. By step twenty, the model is reasoning over a transcript where the signal it needs is buried under thousands of tokens of its own history.

This is context rot: accuracy falls as input length grows, even when the underlying task does not get harder. In a single-agent loop it shows up as the agent forgetting an early instruction or contradicting a decision it made ten steps ago. In a long-running agent it compounds into agent drift, where the agent keeps acting on goals and state that have since been superseded. The shared thread is the mechanism that lets stale context survive long enough to do damage.

Isolation attacks the root cause. If a sub-agent’s exploration never enters the main thread, it cannot rot the main thread. The orchestrator’s window holds the plan and a handful of distilled results, not the full trace of every sub-task. That is why isolation outperforms compression alone: compression shrinks the noise after the fact, while isolation keeps the noise out of the window that matters.

Isolated threads versus static roles

There are two ways to build sub-agents, and they make different trade-offs between specialization and adaptability. Static-role sub-agents are fixed specialists you define in advance: a planner, a coder, a reviewer, each with a hand-written prompt and tool set. Context-isolated sub-agents are spawned per task, each with a context, tools, and model chosen for that specific step. AOrchestra, an orchestration framework from early 2026, frames an agent as a compositional tuple of instruction, context, tools, and model, which lets a central orchestrator “spawn specialized executors for each task on demand” rather than maintaining a fixed roster. Across GAIA, SWE-Bench, and Terminal-Bench, that on-demand approach reported a 16.28% relative improvement over the strongest baseline.

The table below compares the common approaches to handling context across agents.

Approach	How context flows	Strength	Failure mode
Single growing thread	Everything accumulates in one window	Full shared context, simple to build	Context rot, quadratic token cost
Static-role sub-agents	Each fixed role gets scoped context	Predictable, specialized behavior	Inflexible, heavy human engineering
Context-isolated sub-agents	Per-task window, compressed result returned	Adapts to the task, keeps the parent small	Sub-agents lose sight of each other
Linear single thread (Cognition)	One agent, full trace, no parallel branches	No conflicting decisions	Inherits all of context rot’s limits

The recent literature treats isolation as a first-class property rather than an implementation detail. A March 2026 paper, “Context Engineering: From Prompts to Corporate Multi-Agent Architecture,” lists isolation as one of five context quality criteria, alongside relevance, sufficiency, economy, and provenance, and frames context as the agent’s operating system. The framing matters: if isolation is a quality dimension you measure, you design for it, instead of discovering its absence when a run fails.

The catch: isolation can strand shared context

Isolation breaks down when sub-agents need each other’s decisions and only receive each other’s messages. Cognition, the team behind Devin, made this the centerpiece of its essay “Don’t Build Multi-Agents.” Their argument is that every action an agent takes carries an implicit decision, and when sub-agents act in parallel without seeing each other’s full traces, those decisions conflict. Their example is a builder agent producing a game background in one visual style while a second sub-agent builds a character asset in a clashing style, because neither agent ever saw the other’s choices.

This is the real tension. The same boundary that keeps the orchestrator’s window clean also hides each sub-agent’s reasoning from its peers. Pass too much across the boundary and you reintroduce the rot you were trying to prevent. Pass too little and sub-agents make locally sensible, globally incoherent decisions. Cognition’s conclusion is to prefer a single linear thread and share full context rather than fragment it, which avoids conflicting decisions at the cost of inheriting every limitation of the shared thread.

Both camps are right about different workloads. Anthropic’s research task is read-heavy and decomposes cleanly: sub-agents gather independent facts that rarely conflict, so isolation is almost pure upside. Cognition’s coding task is write-heavy and tightly coupled: sub-agents are making interdependent design decisions, so isolation strands exactly the context they need. The lesson is not which side wins. It is that isolation helps in proportion to how independent the sub-tasks actually are.

Making the boundary structural

The way to get isolation’s upside without stranding shared context is to make the boundary an explicit artifact, not a prompt instruction, and to control exactly what crosses it. Three patterns do this in production.

Return compressed results, not traces. The handoff should carry conclusions and the few facts the parent needs, while the intermediate reasoning stays in the sub-agent’s window. Anthropic’s 90.2% improvement comes largely from this: tens of thousands of tokens of exploration distilled into a short summary.

Offload detail to storage the parent references by name. When a sub-agent produces a large artifact, write it to external storage and pass a pointer rather than the content. This is context offloading applied at the agent boundary, and it keeps the main thread small while preserving access to the full result.

Give each sub-agent a scoped data boundary, not a hand-maintained filter. Instead of the orchestrator slicing one shared store and trusting each sub-agent to read only its slice, give each sub-agent its own isolated context container and let it pull only what it is scoped to through that container’s own tools. With Wire, each container is a separately permissioned environment with its own MCP server, so the isolation boundary is the container itself rather than a filter the orchestrator has to enforce by hand. That removes a whole class of context bleed, because a sub-agent physically cannot read into a sibling’s container.

These are the same context engineering techniques that work for single agents, applied at the seams between agents. Scope what each agent sees, compress what crosses boundaries, and offload what does not need to be inline.

When to isolate and when not to

Reach for sub-agent context isolation when a task decomposes into scoped sub-tasks that each fit in a smaller, cleaner context than the whole job would, and when those sub-tasks are genuinely independent. Research, retrieval, broad search, and parallel verification fit this shape well, which is why isolation shows the largest gains there.

Keep a single thread when the work is tightly coupled, when sub-agents would be making interdependent design decisions, or when the task has one objective over one data scope with no real specialization. In those cases the coordination cost and the risk of conflicting decisions outweigh what you save on the context window. Isolation is a context engineering tool, not a default. The question to ask before splitting a task is whether each piece can succeed knowing only its own slice. When the answer is yes, give it its own window. When the answer is no, keep it in the room.

Sources: Anthropic: How We Built Our Multi-Agent Research System · Anthropic: Effective Context Engineering for AI Agents · AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration (arXiv:2602.03786) · Context Engineering: From Prompts to Corporate Multi-Agent Architecture (arXiv:2603.09619) · Cognition: Don’t Build Multi-Agents · UC Berkeley MAST: Why Do Multi-Agent LLM Systems Fail? (arXiv:2503.13657)

Frequently asked questions

When should I use isolated sub-agents instead of a single agent?

Use isolated sub-agents when a task splits into scoped sub-tasks that each fit in a smaller, cleaner context than the whole job would. If the work is one objective over one data scope with no real specialization, a single well-engineered agent usually wins because it avoids coordination cost and the risk of sub-agents making conflicting decisions.

Does sub-agent context isolation reduce token costs?

It can, because the orchestrator never pays to rebill thousands of tokens of intermediate tool calls on every step. Anthropic's research system has each sub-agent explore in tens of thousands of tokens but return only a 1,000 to 2,000 token summary, so the parent's window stays small. Naive isolation that pipes full sub-agent histories back to the parent gives up most of that saving.

What's the difference between context-isolated sub-agents and static-role sub-agents?

Static-role sub-agents are fixed specialists defined ahead of time, which is predictable but inflexible and labor-intensive to maintain. Context-isolated sub-agents are spawned per task with a scoped context, tools, and model, trading hand-tuned specialization for adaptability. AOrchestra treats an agent as a tuple of instruction, context, tools, and model so it can create isolated executors on demand.

How do isolated sub-agents share findings without re-bloating the parent context?

They return a compressed result, not their raw trace. The handoff is a summary of conclusions and the few facts the parent needs to proceed, while the intermediate reasoning stays in the sub-agent's own window. Writing detailed output to external storage the parent can reference by name, rather than pasting it inline, keeps the main thread small.

When does sub-agent isolation backfire?

When sub-agents need each other's implicit decisions and only get isolated messages. Cognition's example is one sub-agent building a game background in one visual style while another builds a mismatched asset, because neither saw the other's choices. Isolation helps when sub-tasks are genuinely independent and hurts when they share a design that no single agent owns.

Multi-Agent System Context Window

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container

Sub-agent context isolation: the fix for context rot

What sub-agent context isolation is

Why one shared thread rots

Isolated threads versus static roles

The catch: isolation can strand shared context

Making the boundary structural

When to isolate and when not to

Frequently asked questions

Related articles

GPT-5.6 subagents: context isolation in the weights

Agent drift: why long-running AI agents lose the plot

Context offloading: 3 patterns for AI agents

Ready to give your AI agents better context?