MCP (Model Context Protocol) Context Offloading Context Window MCP Server

MCP Tasks: long-running work as context offloading

Jitpal Kocher · June 16, 2026 · 9 min read

Key takeaway

MCP Tasks are an extension in the 2026-07-28 Model Context Protocol specification that lets a server answer a tools/call with a durable task handle instead of a blocking result. The agent keeps a short taskId in its context window and polls tasks/get for status and the final result, so the interim output of a long-running operation never fills the window. That makes MCP Tasks a protocol-level form of context offloading: the work is relocated to the server and pulled back only when it completes.

A long-running tool call is one of the fastest ways to ruin an AI agent’s context window. The agent asks a server to run a CI pipeline, kick off a batch job, or wait on a human approval, and the call blocks. While it blocks, the connection stays open and every line of interim output, every progress update, every retry streams back into the window. By the time the operation finishes, the agent is carrying thousands of tokens of exhaust that mattered for none of its reasoning.

MCP Tasks are the protocol’s answer. Introduced in the Model Context Protocol 2026-07-28 specification release candidate, published May 21, 2026, Tasks let a server respond to a tools/call with a durable handle instead of a blocking result. The agent holds a short taskId, the work runs somewhere else, and the result comes back only when it is done. Read through a context engineering lens, MCP Tasks are context offloading built into the protocol.

What MCP Tasks are

MCP Tasks are an extension that lets a server answer a tools/call with an asynchronous task handle instead of a final result. In the 2026-07-28 release candidate, Tasks graduated from an experimental part of the core protocol to an official, versioned extension. A server that accepts a task returns a result tagged resultType: "task" along with a taskId, and the client drives the rest of the lifecycle through three methods: tasks/get to poll status and retrieve the result, tasks/update to send new input, and tasks/cancel to stop the work. The task object carries a status, any in-progress requests back to the client, and a final result or error once it reaches a terminal state.

Task creation is server-directed. The client advertises that it supports the extension in its per-request capabilities, and the server decides, per call, whether a given operation is worth materializing as a task. A fast lookup still returns a normal result inline. A ten-minute batch job returns a handle. The agent does not have to know in advance which calls are slow, because the server tells it at response time.

Why a long-running tool call bloats the context window

A blocking tool call forces every byte of interim output into the agent’s context window. The agent cannot move on until the call returns, so it sits on an open connection while logs, progress lines, and partial results accumulate in the conversation history. None of that is signal for the agent’s actual task, but all of it consumes the same finite context window the agent needs for reasoning.

That dilution has a measured cost. Chroma’s study across 18 frontier models found accuracy dropping from 95% to 60-70% as input length grew, even on simple tasks, and context rot compounds as low-signal tokens crowd out the high-signal ones. The interim output of a long operation is almost entirely low-signal: it matters to the operation, not to the agent driving it.

Before Tasks, teams worked around this in two unsatisfying ways. They let the call block and absorbed the bloat, or they hand-rolled a polling pattern in which one tool starts a job, returns a job ID, and a second tool checks on it. The hand-rolled version solved the blocking problem but reinvented the wheel on every server, with no shared shape for status, results, or cancellation. Both approaches historically leaned on sticky sessions, because the same server instance had to be the one holding the in-flight job.

MCP Tasks are context offloading at the protocol layer

MCP Tasks relocate a long-running operation’s interim state out of the window and replace it with a handle, which is the exact definition of context offloading. Context offloading is the practice of keeping an agent’s working window small by moving state to a destination outside it and pulling it back only when a step needs it. With a task, the destination is the server, the reference is the taskId, and the state pulled back is the final result rather than the whole noisy process that produced it.

That makes Tasks a fourth member of a family already documented in the three context offloading patterns. The first three move durable facts, exploration work, and reference data off the window. Tasks move something the others never addressed: the live state of an operation that is still running.

Offloading pattern	What it moves off the window	Where it goes
Structured note-taking	Durable facts and goals	A file the agent re-reads
Sub-agent delegation	Exploration and search work	A sub-agent’s own window
Just-in-time retrieval	Reference data	A store queried on demand
MCP Tasks	An in-flight operation’s interim state	The server, behind a task handle

The reason this belongs in the protocol rather than in each agent’s harness is consistency. When offloading is a convention everyone invents differently, the agent has to learn each server’s polling dialect, and a missed status field becomes a stuck task. When it is a standard extension, every task speaks the same lifecycle, and the agent’s logic for “start it, hold the handle, fetch when done” works against any compliant server.

How the task lifecycle keeps the window small

The task lifecycle is designed so the agent never holds more than a handle while work is in flight. The agent issues a tools/call, the server responds with a taskId instead of blocking, and the agent’s window now carries a few tokens of reference rather than a stream of output. The agent then polls tasks/get, which returns a status: working, input_required, completed, failed, or cancelled. For terminal states, the same response carries the final result or the error.

The input_required status is the part that matters most for human-in-the-loop work. A step that needs an approval or a missing parameter can pause without the agent camping on an open connection and without the pending request sitting in the window as dead weight. The agent supplies the input through tasks/update and the task resumes. If priorities change, tasks/cancel stops the work cleanly. Throughout, the only thing the agent has to keep in context is the taskId. The progress, the retries, and the intermediate artifacts stay on the server until, and unless, the agent asks for the result.

Why the stateless core makes Tasks reliable

A task is only a safe place to offload work if the agent can still reach it later, and the 2026-07-28 spec’s other headline change is what guarantees that. The release candidate made the protocol stateless: it dropped the initialize handshake and the Mcp-Session-Id header, moved protocol version and client info into _meta fields, and let any server instance handle any request. A task handle is no longer bound to one connection or one node.

In practice that means an agent can start a task, lose its connection, reconnect through a plain round-robin load balancer, and still call tasks/get on whatever instance answers. For offloading, this closes the failure that makes the whole pattern risky: a destination you cannot retrieve from is worse than no offloading at all. The stateless core, which the 2026 MCP roadmap framed as a context-delivery decision rather than a plumbing one, is what turns a task from a fragile in-memory job into a durable handle the agent can trust.

What MCP Tasks don’t fix

Tasks move the process off the window, not the payload, and that distinction is where the pattern can disappoint. Four costs are worth planning around before turning it on:

Polling latency. tasks/get is a round trip, and an agent that polls aggressively trades context savings for wall-clock time and call volume. The right cadence depends on how long the work actually takes, and a tight loop on a ten-minute job is just waste in a different column.
A large result still bloats the window. Offloading the interim state does nothing if the completed task returns a 50,000-token blob. The result lands in the window like any tool output. Servers still have to return tight, structured results, and agents still have to budget for what comes back.
input_required needs a real path back. A task that pauses for input and never gets it stalls. If the agent’s loop does not have a branch for supplying that input, the offloaded work becomes offloaded-and-forgotten, the failure mode every offloading pattern shares.
Server durability is now load-bearing. Because the agent trusts the server to hold the work, a server that loses a task on restart breaks the contract. Stateless routing makes tasks reachable across instances, but the task state itself still has to survive somewhere durable.

These are the same trade-offs that make offloading an engineering decision rather than a default. A short operation that fits in the window with room to spare does not need a task, and long-horizon agents that already lose the plot will not be saved by one more handle if the rest of their context is unmanaged.

Where MCP Tasks fit in your context strategy

MCP Tasks join the offloading toolkit; they do not replace it. Note-taking still handles durable goals, sub-agents still handle exploration, and just-in-time retrieval still handles reference data. Tasks handle the in-flight operation, and the four compose because they target non-overlapping sources of window bloat. The discipline that ties them together is treating the window as a context budget with a fixed ceiling and deciding deliberately what earns a place inside it.

This hold-a-reference model is already how context platforms keep agents lean on the data side. Each Wire container runs its own remote MCP server, and an agent connected to it holds only a container reference and calls wire_search for the entries a step needs, so millions of entries never enter the window. Tasks extend that same discipline from data to operations: a handle in the window, the bulk somewhere the agent can reach it. The broader set of context engineering techniques is the larger toolkit this slots into.

The takeaway is that the protocol is starting to absorb context engineering patterns that used to live in application code. A long-running tool call no longer has to choose between blocking the agent and bloating its window. It can hand back a handle, run somewhere durable, and return only the result that matters. The agent that works best is still the one carrying exactly what the current step needs, with everything else one fetch away.

Sources: MCP 2026-07-28 Specification Release Candidate · Model Context Protocol: Tasks Extension · Anthropic: Effective Context Engineering for AI Agents · Chroma: Context Rot

Frequently asked questions

How are MCP Tasks different from a normal tool call?

A normal tool call blocks until it returns a result, holding the connection open and streaming any interim output into the agent's context window. An MCP task returns a handle immediately, runs the work server-side, and lets the agent retrieve the result later with tasks/get. The difference is where the operation's interim state lives: in the window, or behind a handle.

When should an MCP server return a task instead of a result?

Return a task when the operation takes long enough that blocking would be wasteful or fragile, such as CI runs, batch jobs, or steps that wait on human approval. Task creation is server-directed, so the server decides per request. Fast, deterministic calls should still return a normal result.

Do MCP Tasks reduce an agent's context usage?

Yes, for the interim state of long-running work. The agent holds a short taskId instead of the streamed progress, logs, and retries a blocking call would deposit in the window. They do not shrink the final result, so a task that returns a large payload still has to be kept tight.

How does an agent retrieve the result of an MCP task?

The agent polls tasks/get with the taskId. The response carries the current status, one of working, input_required, completed, failed, or cancelled, and for terminal states it includes the final result or error. The agent can also call tasks/cancel to stop the work or tasks/update to send required input.

What happens to an MCP task if the agent disconnects?

Because the 2026-07-28 spec made the protocol stateless, a task handle is durable across reconnections and is not tied to one server instance. An agent can drop the connection and later call tasks/get on any instance to retrieve the result. This is what makes a task a reliable place to offload work to.

MCP (Model Context Protocol) MCP Server

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container