When MCP integrations break, the instinct is to blame the protocol. Authentication is missing, tool definitions are wrong, the server is misconfigured. Fix the config, fix the problem.

A new research paper challenges that framing. In March 2026, a team from Chongqing University published the first systematic study of MCP failures: Real Faults in Model Context Protocol Software, analyzing 3,282 bug-related issues from 385 MCP server repositories. They identified 407 MCP-specific faults and surveyed 41 practitioners building MCP systems in production.

The data shows a pattern that has less to do with the protocol and more to do with context delivery. The most common MCP failures are the same failures that show up everywhere in AI systems when context is poorly designed: wrong information reaching the model, too much information overwhelming it, or information in the wrong format for the model to use.

MCP handles the connection layer. The context flowing through it still needs to be designed.

What 3,282 bug reports actually show

The researchers categorized the 407 MCP-specific faults into five groups:

Server/tool configuration (133 issues, 31.74%): Problems with how tools are defined, what they return, and how they integrate with the host
Server/host configuration (120 issues, 28.64%): Connection and coordination failures between servers and clients
Server setting (115 issues, 27.45%): Setup problems preventing servers from operating
Documentation (29 issues, 6.92%): Missing or incorrect guidance
General programming (22 issues, 5.25%): Standard software bugs unrelated to MCP

The top three categories account for 88% of all faults. And when you look at what actually goes wrong within those categories, a theme emerges: the failures are mostly about what data gets returned, how it’s formatted, and whether the model can use it.

Among tool configuration issues, 63 of 133 fall under “tool call and execution.” Not authentication failures. Not discovery problems. Execution failures: the tool runs, returns something, and the model can’t work with it. This is a context engineering problem.

When practitioners were surveyed about which faults they encounter most often, tool response handling led the list at 66.67%. Two-thirds of production MCP users regularly hit situations where tool outputs don’t reach the model in a usable form.

The context engineering gaps inside MCP

Context engineering is the discipline of designing what information reaches an AI model and how. MCP standardizes the transport. It doesn’t define what good context looks like or how to produce it.

Tool outputs expand the context window

Every MCP tool call adds tokens to the model’s context window. Most MCP server implementations return whatever the underlying API returns: full JSON payloads, paginated datasets, verbose log formats. The model receives all of it.

The same context rot and attention dilution that degrades performance in long prompts applies here. Research from Chroma shows model accuracy drops from 95% to 60-70% as context length grows, even on simple tasks. A single MCP tool call returning a 50KB API response can push a conversation past the point where the model can effectively reason.

One analysis of MCP cost patterns estimated approximately $1 per request when tool outputs reach a megabyte. That cost reflects something more important than the invoice: a megabyte of context injected per query, most of which is likely irrelevant to the actual question.

Connecting an MCP tool directly to a raw API endpoint without shaping its output is the same mistake as stuffing an entire document library into a prompt and hoping the model finds what it needs. The protocol handles the connection. Someone still has to design what comes through it.

Tool descriptions are context too

MCP tools expose themselves through names and descriptions. When an agent has access to many tools, it selects which ones to call based entirely on those descriptions. The descriptions are context engineering artifacts: they shape what the model understands about its capabilities and when to use each one.

The research found that tool discovery and registration is rated the most critical fault category by practitioners, with 31.58% marking it as critical despite a lower encounter rate. The perception matches the impact: if the model can’t correctly identify which tool to call, nothing downstream works correctly. But the tools themselves aren’t broken. The context describing them is.

The Tau-Bench benchmark makes this concrete. Even the most capable reasoning models complete only 16% of complex multi-step tool tasks successfully. These aren’t protocol failures. They’re failures of the model to maintain coherent context across a sequence of tool calls. The state accumulates, the context window fills, and the model loses track of where it is in the task.

External context needs a perimeter

When an MCP server retrieves content from outside your system and passes it to the model, that content enters the context window without a defined security boundary. Instructions embedded in third-party data arrive alongside your own system prompts, and the model has no reliable way to distinguish between them.

AI Agents Have Too Much Access covers this in depth. The core principle applies directly to MCP: anything your agents retrieve from external sources should be treated as untrusted context until it crosses into your security perimeter. Defining that perimeter, and what gets through it, is a context engineering decision, not something the protocol handles for you.

What doesn’t work

Adding authentication

Auth fixes the “who can call this tool” problem. It doesn’t fix what the tool returns or how the model uses it. The most common MCP failures in production, tool response handling at 66.67%, occur in authenticated sessions. A properly authenticated tool can still return malformed, oversized, or unstructured data that degrades model performance.

Patching tool definitions one at a time

Most teams treat MCP configuration as a one-time setup problem. Define the tools, write the descriptions, deploy the server. But tool outputs change when upstream APIs change. Descriptions that worked for one model may fail with another because different LLMs have different sensitivities to how tools are described.

Tool configuration is an ongoing context engineering practice, not a deployment artifact.

Using bigger context windows

Larger context windows let you fit more tool outputs without hitting hard limits. They don’t fix what happens to model performance when the context is full of irrelevant data. The lost-in-the-middle effect, where information in the middle of the context window is substantially less attended to than information at the edges, means a longer context window can make certain retrieval failures worse, not better.

What actually works: designing context, not just piping data

The teams that get MCP working reliably in production share a common approach: they treat every tool response as a context design problem.

Return less, but better

A tool that returns a structured summary of the three most relevant records is more useful than a tool that returns a full API payload the model has to parse. Before building an MCP tool, ask: what is the minimum information this tool needs to return for the model to take the right next action?

This is the same principle that makes selective RAG outperform naive RAG. Retrieving 3 highly relevant chunks outperforms retrieving 20 and hoping the model finds the signal. The same logic applies to MCP tool outputs.

Pre-process before the model sees it

The most effective context engineering happens before the model ever sees the data. Extract entities, summarize sections, build structured representations at indexing time. At query time, the MCP tool returns pre-processed, organized context rather than raw source material.

Tools like Wire take this approach with context containers: everything that flows through the container, whether files added at setup or entries written by agents at runtime, is continuously structured and optimized so tool calls return focused, AI-ready context rather than raw documents.

Write tool descriptions like they’re instructions

Tool descriptions are read by the model, not by humans. Write them to answer: when should I call this tool, what exactly will it return, and in what situations should I prefer it over other available tools. Ambiguous descriptions create the same uncertainty as ambiguous prompts.

A practical checklist

If you’re building or debugging MCP integrations:

Audit your tool outputs. Log what your tools actually return in production. Are responses consistently sized and structured, or do they vary by orders of magnitude?
Check response sizes. If any tool returns more than 10-20KB regularly, consider whether the output needs to be summarized or filtered before reaching the model.
Rewrite tool descriptions as model instructions. Read each description from the perspective of a model choosing between tools. Is it unambiguous? Does it explain what the tool returns?
Treat multi-step tool use as a context budget. If a task requires five tool calls, each adding to the context window, the fifth call is operating in a much noisier environment than the first. Design accordingly.
Define a perimeter for external context. Any tool that retrieves third-party data is bringing untrusted content into your context window. Decide what crosses the boundary and what gets filtered before it reaches the model.

The Model Context Protocol solves a real problem: it standardizes how AI tools connect to external data sources. But it solves the connection problem, not the context problem. What flows through the connection still needs to be designed, structured, and sized for the model that will consume it. That’s context engineering, and MCP doesn’t do it for you.

Why MCP failures are a context engineering problem