MCP (Model Context Protocol) MCP Server Context Engineering Context Window AI Agent

Progressive tool loading is the new MCP context pattern

Jitpal Kocher · May 6, 2026 · 9 min read

Key takeaway

Progressive tool loading defers MCP tool definitions until an agent actually needs them, instead of dumping every connected tool into the system prompt at session start. Anthropic's code-execution-with-MCP pattern reports a reduction from roughly 150,000 input tokens to 2,000 for the same task, a 98.7% drop. As agents connect to more servers in 2026, preloaded tool catalogs are the largest single source of wasted context, and progressive disclosure has become the default in production MCP design.

The headline number from Anthropic’s April 2026 work on MCP is hard to ignore. A reference agent task that needed roughly 150,000 input tokens with all tools preloaded into context dropped to about 2,000 tokens when tool definitions were loaded only when used. That is a 98.7% reduction on the same task, on the same model, with the same tools available.

That gap is the story of MCP in 2026. The protocol succeeded so completely that connecting an agent to more than a handful of servers became the dominant context cost. By April 2026 there were over 10,000 enterprise MCP servers and more than 97 million SDK downloads across providers, and the average agent now lives inside a tool catalog large enough to crowd out the user’s actual question. Progressive tool loading, sometimes called progressive disclosure, is how production teams are responding.

This post is about what progressive tool loading actually is, why preloaded catalogs broke down, and the design choices that separate the implementations that work from the ones that just shuffle the same tokens around.

What progressive tool loading means in practice

Progressive tool loading defers the body of an MCP tool definition until the agent needs it, instead of placing every connected tool’s full schema into the system prompt at session start. The agent still discovers what tools exist, but the verbose schemas, parameter descriptions, and example payloads load on demand. Anthropic’s code-execution-with-MCP write-up frames this as the default pattern for new servers, and parallel implementations from Klavis, Speakeasy, and the broader MCP community have converged on the same shape.

Concretely, a session starts with a small index. Tool name, one-line purpose, server of origin. When the model decides to call a tool, the runtime fetches that tool’s full schema, places it in context for the call, and lets it fall out afterward. Multi-step tasks accumulate only the schemas the trajectory actually touched, not the union of every connected server.

The technique is a context engineering move, not a protocol change. It works on the current MCP spec because MCP already separates discovery from invocation. What changed in 2026 is the assumption that every client should serialize the discovery results into one big system prompt.

Why preloaded tool catalogs broke down

Preloaded catalogs broke down because tool surface scaled faster than context windows did. By Q1 2026 a typical agent in production was connecting to between five and twenty MCP servers, each exposing five to fifty tools, each with a JSON Schema that is rarely under 300 tokens and often over 1,000. The arithmetic is unforgiving: ten servers with twenty tools at 500 tokens each is 100,000 tokens before the user has typed a word.

Three failure modes show up at that scale.

The first is straightforward cost. Long inputs get charged on every turn, and tool catalogs are present on every turn, so the marginal cost of an unused tool is paid forever. Prompt caching helps but does not eliminate the bill, especially when catalogs change across sessions.

The second is context rot. The longer the input, the worse models attend to mid-context content, and tool catalogs typically sit in exactly that mid-context band. Stanford’s lost-in-the-middle work has been replicated across more than a dozen models since 2023, and the implication for tool design is the same one the agent drift literature describes: relevant tools buried under dozens of irrelevant ones get used less reliably than the same tools presented in isolation.

The third is security blast radius. The April 2026 MCP security audits reported that 43% of public MCP servers had at least one vulnerability and that 5.5% already shipped with poisoned descriptions in the wild. Preloading tool definitions means every poisoned description enters context every session, which is the worst possible substrate for prompt-injection mitigation. A pattern that loads descriptions only on use shrinks that surface significantly. It does not solve tool poisoning, but it stops amplifying it.

How the pattern works under the hood

Progressive tool loading is usually implemented as a thin runtime layer between the agent and its MCP servers. The runtime keeps two views of the tool surface.

The first view is the index. A short list of (server, tool, one-line description) tuples, kept in context for the entire session, that the model uses to decide what to call. Indexes typically run 10 to 30 tokens per tool, an order of magnitude smaller than full schemas.

The second view is the lazy detail. Full schemas, parameter descriptions, example inputs, and any server-supplied annotations live in the runtime, not in context. When the model emits a call to a tool whose schema it has not seen, the runtime intercepts the call, fetches the schema, validates the call against it, and either executes or returns the schema for a corrected retry. After execution, the schema can either fall out of context or stay for the rest of the session depending on policy.

Two policy choices dominate the implementation space.

Policy	What stays in context	When it fits
Strict lazy	Only the index. Schemas load and unload per call.	High tool counts, short tasks, cost-sensitive workloads
Sticky lazy	Index plus schemas of any tool used so far this session.	Multi-step tasks where the same tool is called repeatedly
Bounded sticky	Index plus an LRU of the N most recently used schemas.	The default for general-purpose agents
Code-mediated	Index only, with tools invoked from generated code rather than direct calls.	Highest token efficiency, requires sandbox

Code-mediated invocation is the variant Anthropic benchmarked at the 98.7% reduction. The agent writes code that imports tools from a typed namespace, the sandbox executes that code, and only the index and the code execution result enter context. It is the most efficient because the entire intermediate trajectory (parameter selection, error handling, sub-tool calls) happens outside the model’s context window. It is also the most invasive to deploy.

Bounded sticky is the variant most teams reach for first because it requires no execution sandbox and behaves close to the preloaded version on small surfaces. The tradeoff is less aggressive savings.

What this means for MCP server design

If your server is going to be consumed under a progressive loading runtime, two design decisions matter more than they used to.

Tool descriptions become the index. The one-line description in your tool registration is now the only thing the agent sees by default; it does the work that the full schema used to do. Vague descriptions cost calls because the agent picks the wrong tool, then has to recover. Specific, action-shaped descriptions (“search across container entries semantically and return relevance-ranked matches with provenance”) outperform generic ones (“query the container”) by a wide margin, and the gap widens under progressive loading because the schema is no longer there to disambiguate.

Tool surfaces should be narrower per tool, not wider. The one-job-per-tool pattern we benchmarked in April produced a 24% reduction in total calls and a 7% lift in correctness on the same dataset. Under progressive loading those numbers compound, because each wrong-mode retry on an overloaded tool now also pays the cost of fetching that tool’s schema, not just the cost of the call. Mode parameters on tools were always a soft anti-pattern; progressive loading turns them into a measurable one.

Schemas should declare what is optional aggressively. When a schema does load, the model spends attention on every required parameter. Required parameters that are usually defaulted, deprecated fields kept for backward compatibility, and verbose enum lists are all attention drains that progressive loading cannot save you from once the schema is in context. The cleanest schemas in the wild after April 2026 read like API surfaces with strong defaults, not like exhaustive specifications.

This is also a moment to revisit the MCP 2026 roadmap. Progressive discovery and composable tool execution sit alongside stateless transport and server discovery as priorities, and the infrastructure those features unlock is what makes runtime-side progressive loading cheap. The pattern will work better on servers that opt into the discovery primitives than on servers that emulate them.

Where progressive loading does not help

Progressive loading is not free, and the cases where it loses are worth naming clearly.

Single-server, low-tool-count agents pay for the bookkeeping without recovering enough context. A coding assistant connected to one MCP server with eight tools is best served by the preloaded catalog. The break-even is somewhere between fifteen and thirty tools depending on schema size; below that, the index plus runtime adds latency without buying back enough tokens.

Tasks with extreme tool churn can defeat sticky policies. An agent that calls a different tool on every turn forces a fetch on every turn, and the fetch latency starts to matter. Strict lazy is correct here; bounded sticky thrashes.

Latency-sensitive workloads pay an extra round trip on first use of any tool. For interactive agents this is usually invisible, but for sub-second SLAs it can be the difference between hitting and missing budget. The mitigation is server-side support for batched schema fetches, which the 2026 spec work is moving toward.

And progressive loading does not fix tool design. A poorly described tool is still a poorly described tool when its description is loaded lazily. The pattern reduces the cost of badly designed surfaces; it does not redeem them. Teams that ship progressive loading on top of overloaded tools sometimes report disappointing numbers, and the diagnosis is almost always upstream of the runtime.

What to do this week

If you run agents in production against more than a handful of MCP servers, three changes pay back quickly.

Audit the tool-catalog cost on a typical session. Count input tokens before the first user message. If it is over 20,000, you are in the band where progressive loading is worth implementing.

Rewrite your one-line tool descriptions assuming they are the only thing the agent will see. Action verb, scope, output shape. That is the entire budget.

Pick a sticky policy before you pick a runtime. Bounded sticky with an LRU of five to ten schemas is the right default for most agents; strict lazy is right when tool churn is high; code-mediated is right when you already have a sandbox. Choosing the policy first keeps the runtime decision from leaking into your agent code.

The pattern is small. The savings are not.

Sources: Code execution with MCP (Anthropic) · MCP’s 2026 Roadmap · State of Context Engineering 2026 (Aurimas Griciūnas) · Agentic Context Engineering (arXiv:2510.04618, ICLR 2026) · CIS MCP Security Guide (Cequence) · Context engineering as the missing layer in agentic AI (SiliconANGLE) · Progressive Disclosure MCP benchmark (Matthew Kruczek) · Lost in the Middle (Liu et al., Stanford)

Frequently asked questions

How is progressive tool loading different from just having fewer MCP tools?

Fewer tools is a static design choice; progressive loading is dynamic. The agent still has access to many tools, but only the tool definitions relevant to the current step are placed in the context window. The catalog stays large; the working set stays small.

When should I switch from preloaded tools to progressive disclosure?

When an agent connects to more than a handful of MCP servers, or when any single server exposes more than ten or so tools. Below that threshold, the bookkeeping of progressive disclosure costs more than it saves. Above it, preloaded catalogs eat enough context to materially slow inference and cost.

Does progressive tool loading hurt accuracy?

On well-designed servers it usually helps, because the model spends less attention on irrelevant tool definitions. The risk case is when the agent does not know a tool exists and never asks for it. A short index of tool names and one-line descriptions, kept in context, mitigates that and is the pattern most production systems converge on.

Is progressive disclosure part of the official MCP spec?

Progressive discovery and composable tool execution are listed on the MCP 2026 roadmap as priorities for stateless transport and server discovery. Implementations exist in production today on top of the current spec, often as a thin wrapper that lazily fetches tool definitions from connected servers.

How does this interact with MCP security?

It improves the blast radius of a compromised server. If a poisoned tool description is only injected into context when the relevant tool is called, the prompt-injection surface shrinks to that flow rather than every session. It does not eliminate the risk and does not replace authorization or signed metadata.

MCP (Model Context Protocol) MCP Server

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container

Progressive tool loading is the new MCP context pattern

What progressive tool loading means in practice

Why preloaded tool catalogs broke down

How the pattern works under the hood

What this means for MCP server design

Where progressive loading does not help

What to do this week

Frequently asked questions

Related articles

MCP vs Skills vs CLI: which one wastes the least context?

Tool calling is retrieval: what Needle proves at 26M

What MCP's 2026 roadmap means for context delivery

Ready to give your AI agents better context?