What Anthropic's context engineering guides leave out

Jitpal Kocher · · 11 min read

Key takeaway

Anthropic's three 2026 engineering posts on context, tools, and code execution with MCP read like a connected trilogy. Each one prescribes how to use context, structure tools, and run MCP servers efficiently, and each one assumes a substrate that holds the context, exposes the tools, and owns the data. The substrate layer determines whether the recommendations work in production. Context engineering is incomplete until you name where the context lives.

Anthropic published three engineering posts in 2026 that read like a connected trilogy on context engineering. “Effective context engineering for AI agents” sets the framework. “Writing effective tools for AI agents” applies it to tool design. “Code execution with MCP” extends it to how agents call those tools at scale. Read individually they look like distinct topics. Read together they describe a coherent stance on how agents should consume context.

They also share one absence. Each post prescribes how to use, structure, and call into a layer the agent does not own. None of them name that layer, much less prescribe how to build it. The substrate where context lives, where tool definitions resolve, where the data your agent retrieves actually sits, is treated as if it is already there. For many production agent teams, it is not.

This piece reads the three guides side by side, surfaces the substrate assumptions inside each, and names what is left for the reader to build.

The three guides at a glance

The three posts span the agent stack, but each one starts mid-stack. Below is the load-bearing recommendation from each guide and the substrate assumption it leaves implicit.

GuideLoad-bearing recommendationAssumed substrate
Effective context engineering”Just in time” runtime retrieval using lightweight identifiers (file paths, queries, links)A storage layer the identifiers resolve to, with addressing, permissions, and consistency already solved
Writing tools for agentsConsolidate functionality, return only high-signal information, replace low-level identifiers with semantic languageA data layer you own end to end, with schemas, indexes, and a stable contract between tool and storage
Code execution with MCPOrganize MCP servers as a file tree; agents read tool definitions on demand and process data in code before returning to the modelA populated MCP fleet, multiple data sources behind it, and a governance model for tool composition

The pattern repeats. Each post answers a “how should the agent behave” question, and each one assumes the answer to “where does the data live” is already known.

Effective context engineering assumes a substrate with addressable identifiers

The first guide treats context as a finite resource and prescribes runtime curation. The post argues for “the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome,” and recommends “just in time” retrieval over preloading. Concretely, the guide tells agents to “maintain lightweight identifiers (file paths, stored queries, web links, etc.)” and load data into context only when needed.

The recommendation is correct. The recommendation is also incomplete. A file path is a pointer; it has to resolve to something. A stored query is a recipe; the corpus has to exist. A web link is a request URL; the endpoint has to honor the contract the agent expects. Every identifier the post recommends is a handle on a system the post does not describe.

The same is true for the structured note-taking pattern the guide proposes for long-horizon work. “Persisted to memory outside of the context window” is the prescription. Persisted where, in what schema, with what retention, recoverable by which next agent, isolated from which other tenant, is the unsolved part. Anthropic shipped a memory tool on the Claude Developer Platform as a file-based system in 2026, which closes the gap for users of that platform. For everyone else, the substrate decision sits with the team.

The deeper read is that context engineering as a discipline is downstream of substrate decisions. If your storage layer cannot deliver typed, addressable, provenance-bearing results, the curation strategies the guide describes have nowhere stable to apply. The agent gets clever about a layer that is not stable enough to be clever about.

Writing tools for agents assumes you own the data layer

The second guide is the most prescriptive of the three. It argues against “wrapping existing APIs” and for tools that “consolidate functionality, handling potentially multiple discrete operations.” It recommends that tools “return only high signal information,” “replace low-level technical identifiers with semantic language,” and apply pagination, filtering, and truncation with sensible defaults.

These are good recommendations. They are also recommendations a tool author can only follow if they own the system the tool calls into. Consolidating functionality requires control of the backend. Returning only high-signal fields requires knowing which fields the agent will actually use, which requires having designed both sides of the contract. Replacing identifiers with semantic language requires a schema that carries semantics, not just rows.

The post does not discuss where the data originates, how schemas are defined or evolved, who owns governance and permissions, or whether tools assume real-time or cached views. The closest the piece gets to infrastructure is naming a tool descriptor pattern, and the data the descriptor refers to is treated as a given. When teams try to apply the guidance to systems they do not own, they run into the gap fast. A read-only tool against a vendor API cannot consolidate functionality. A tool against a stale cache cannot return real-time signals. A tool over a shared database cannot be redesigned to remove ambiguous identifiers without coordination the post does not address.

The unstated prerequisite is that the team building the tool owns, or can negotiate ownership of, the data layer underneath. That is a substrate property, not a tool property. Production teams that have not solved it spend the first month following the guide and discover they were optimizing the wrong layer.

Code execution with MCP assumes the MCP fleet already exists

The third guide is the most architectural. It argues that as the number of MCP-connected tools grows, two problems compound: tool definitions consume too many tokens, and intermediate results flow through the model repeatedly. The fix is to let agents write code that calls MCP servers, processes results locally, and returns only summaries to the model.

The recommended layout is concrete. Each MCP server becomes a directory on a virtual filesystem. Each tool becomes a file. Agents navigate the tree, read tool definitions on demand, and compose calls across multiple servers in a single code block. The piece notes that “models are great at navigating filesystems” and turns that observation into a discovery pattern.

The pattern depends on substrate that the post never specifies. The agent walks a tree because the tree exists, which means somebody has stood up the MCP servers and connected them to real data sources. The agent reads tool definitions on demand because the definitions are written, versioned, and discoverable from a known root. The agent composes calls across servers because the servers expose compatible response shapes, or because the team has done the work to normalize them.

The recommendation also implies a governance answer the piece does not give. Which servers are in the tree, who decides, how PII concerns are handled before responses reach the model, how credentials are propagated, how access is revoked when a teammate leaves. The post mentions automatic tokenization for sensitive data at the MCP client boundary, but tokenization is a feature of the substrate, not of the code execution pattern. The substrate either supports it or does not.

MCP as a protocol is now stewarded by the Linux Foundation Agentic AI Foundation, and the protocol has matured to where prescriptive guidance about it makes sense. The substrate that hosts MCP servers, populates them with data, and governs their lifecycle is the unsolved layer the guide skips. The MCP 2026 roadmap analysis walks through where the protocol itself is heading, which is parallel to the substrate question, not a replacement for it.

What “substrate” actually means in agent context

Three properties separate a real substrate from a placeholder. Each one is load-bearing on its own. Together they are what makes Anthropic’s recommendations transferable from one team to another.

Durability across harness switches is the first. The substrate has to outlive the CLI or agent that was open when the context was created. If your context layer only works inside one harness’s prompt, runtime, or memory feature, it is not a substrate; it is a session. Teams running Claude Code today and a different agent next quarter need their accumulated entries, schemas, and tool definitions to survive that change. The companion piece on substrates and harnesses makes this argument from the harness side. Here, the same property is what makes Anthropic’s tool and context recommendations portable at all.

Provenance attached to results is the second. When a tool returns a chunk of text, the agent has to know where it came from, when it was last updated, what it links to, and what authority it carries. A score and a content blob are insufficient; the agent reconstructs the rest from the text, and a different agent will reconstruct it differently. Provenance as a context engineering primitive covers the shape of this metadata; the substrate is what attaches it.

Tenant isolation is the third. Multi-tenant agent systems leak through their substrates, not their prompts. If two customers’ contexts can co-mingle inside a shared retrieval index, no amount of careful context engineering at the agent layer recovers the boundary. The substrate decision is whether isolation is a deployment property (one substrate instance per tenant) or a query property (one shared index with filters). The guides do not name the choice; production agents are shaped by it.

A context layer that has durability, provenance, and isolation supports the recommendations in all three Anthropic guides cleanly. A context layer without them produces agents that look right in a demo and fall apart in the second month of production use.

What’s still on you to build

Read the three guides as a checklist of what you still have to decide.

Where the context lives. File on disk, vector index, document database, structured store, MCP server backed by one of the above. The decision frames every subsequent one. Anthropic’s guidance applies to any of them, which is also why none of them are named.

Who owns the schema. The tools the agent calls have to know what shape the data takes. If your schema lives in a vendor API you do not control, you can wrap and translate, but you cannot redesign. Teams that hit the ceiling of the tool-writing guide usually hit it here.

Who controls access. Substrate-layer permissions are what makes tools safe to expose; harness-layer permissions are what limits which agent uses which tool. The two are not interchangeable. Code execution with MCP only works if the substrate enforces access before tokens reach the model.

Who attaches provenance. Identifiers, positions, timestamps, and typed edges have to land in tool responses somewhere. That somewhere is the substrate’s responsibility, not the tool’s, not the agent’s.

These are the questions Anthropic’s guides do not answer for you. They are also the questions that determine whether the guides, applied carefully, produce a production agent or a clever proof of concept.

Anthropic teaches the agent layer well; the substrate layer is yours

The three guides are the best published material on context engineering, tool design, and MCP-driven agent execution. Practitioners should read them, internalize them, and apply them. They should also understand which questions the guides do not answer.

The substrate is the place where context, schemas, provenance, and access live across agent sessions and harness changes. Without one, the guides describe a finished house with no foundation drawn. With one, the guides become a precise specification for what the agent does on top of it. The order of operations is: decide the substrate first, apply the agent-layer guidance second. Teams that do it the other way usually rebuild the agent layer once they realize the substrate constraints they wrote around are the ones they should have set.

In Wire, the substrate is the context container: a per-organization remote MCP server with structured entries, provenance on every retrieval, and tenant isolation at the container boundary, which is what makes Anthropic’s just-in-time retrieval, tool consolidation, and code-execution recommendations land as they read on the page.


Sources: Effective context engineering for AI agents · Writing effective tools for AI agents · Code execution with MCP · Scaling Managed Agents · Linux Foundation: Formation of the Agentic AI Foundation

Frequently asked questions

Where does Anthropic's memory tool actually store data?
Anthropic's memory tool on the Claude Developer Platform is a file-based system that persists structured entries outside the model's context window. It runs inside Anthropic's managed environment, so storage, retention, and recovery are handled by the platform rather than the team using it. Teams running agents outside that environment have to provide the equivalent storage layer themselves.
Is code execution with MCP a replacement for direct tool calls?
No, it's a composition pattern on top of them. Direct tool calls still happen at the MCP server boundary, but the agent writes a code block that invokes multiple tools, processes results locally, and returns only summaries to the model. The benefit is fewer tokens spent on tool definitions and intermediate results; the underlying tools still need to exist.
Why do Anthropic's context engineering recommendations assume a retrieval system already exists?
The guides focus on how an agent should consume context, not on how that context is produced. Recommendations like 'use lightweight identifiers' or 'load data into context at runtime' presume that the identifiers resolve to a real storage layer with addressing, permissions, and durability already solved. The retrieval system is the prerequisite the guides build on.
How do I tell whether my context substrate is production-ready?
Look for three properties: durability across harness switches (context survives moving between agents or CLIs), provenance on every retrieval result (source identity, position, timestamps, typed edges), and tenant isolation at the storage boundary (not just at query time). A substrate missing any of these will leak under load.
What's the difference between an MCP server and a context substrate?
An MCP server is a runtime surface that exposes tools and returns results; a substrate is the underlying layer that holds the data those tools read from and write to. One MCP server can front many substrate shapes, and one substrate can be exposed through multiple MCP servers. The distinction matters because tool design assumes a substrate exists with stable schemas, ownership, and access controls.

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container