Context Engineering Structured Context Epistemic Provenance Multi-Agent System Context Compression

Five criteria of good context for AI agents

Jitpal Kocher · June 9, 2026 · 8 min read

Key takeaway

Good context for AI agents meets five design criteria: relevance, sufficiency, isolation, economy, and provenance. The framework comes from a March 2026 paper that treats context as the agent's operating system and arranges context engineering as one rung in a four-discipline maturity model. The criteria are cumulative: economy and provenance only pay off once relevance, sufficiency, and isolation already hold. Most agent failures trace to a missing criterion rather than a weak model, which is why larger context windows rarely fix them.

Good context for AI agents meets five design criteria: relevance, sufficiency, isolation, economy, and provenance. The framework comes from Context Engineering: From Prompts to Corporate Multi-Agent Architecture, a March 2026 paper by Vera V. Vishnyakova that argues prompt wording stops being the bottleneck once an AI system becomes an autonomous agent rather than a stateless chatbot. What matters then is the entire informational environment the agent reasons inside, and these five criteria are how you judge whether that environment is well built.

The criteria are useful because they turn a vague goal (“give the agent good context”) into five separable questions you can design against and check one at a time. They also explain a pattern most teams hit in production: an agent fails, the model gets blamed, a bigger model or a longer window gets swapped in, and nothing improves, because the actual defect was a missing criterion the model can do nothing about.

Context engineering is now a named discipline with quality criteria

Context engineering has graduated from a loose practice into a discipline with its own quality bar. The paper places it as the second rung of a four-discipline maturity model, prompt engineering, then context engineering, then intent engineering, then specification engineering, each building cumulatively on the one below. Its central claim is that “context as the agent’s operating system” is the right mental model: the context window is not a prompt, it is the runtime environment in which every decision is made, and whoever controls that environment controls the agent’s behavior.

That reframing matters because it changes what counts as a fix. If context is just a prompt, you improve it by writing better instructions. If context is an operating system, you improve it by engineering what gets loaded, when, in what structure, and with what provenance, which is exactly what moving beyond prompt engineering means in practice. The five criteria are the spec sheet for that environment.

The five criteria at a glance

Each criterion answers a distinct question, and a context window can pass one while failing another. The table below summarizes what each one governs and the failure mode you see when it is missing.

Criterion	The question it answers	Failure mode when missing
Relevance	Is this information appropriate to the decision at hand?	The model attends to off-topic content and drifts
Sufficiency	Is there enough information to complete the task?	The model guesses or hallucinates to fill gaps
Isolation	Is conflicting or extraneous data kept out?	Cross-talk and contradictions corrupt reasoning
Economy	Is the information structured efficiently?	Token bloat, slower inference, higher cost
Provenance	Can each fact be traced to a source?	The agent can’t weigh or verify what it was given

Read together, the criteria are an ordered set, not a flat checklist. Relevance, sufficiency, and isolation determine whether the context is correct at all. Economy and provenance determine whether a correct context is also efficient and trustworthy. Optimizing economy before relevance holds is how teams end up with a beautifully compressed window full of the wrong things.

Relevance: information appropriate to the decision

Relevance is the first criterion because irrelevant context is not neutral, it is harmful. Transformer attention is finite and gets diluted across whatever you load, so every off-topic token competes with the tokens that actually matter. This is the mechanism behind context rot: accuracy degrades not when the window fills but when the share of relevant content drops, often well before any size limit.

Relevance is also the criterion most damaged by the instinct to “add more just in case.” A larger context window tempts teams to include marginally related documents, which lowers the relevance ratio and makes the agent worse. Designing for relevance means retrieving narrowly and deliberately for the current step, not preloading everything the agent might conceivably need.

Sufficiency: enough information to finish the task

Sufficiency is the mirror image of relevance: the context must contain everything the task genuinely requires, or the model fills the gap by guessing. Under-provisioned context is one of the most common roots of hallucination, because a model asked a question it lacks the grounding to answer will still answer. The fix is not a bigger model but the missing fact placed in front of it.

Sufficiency and relevance pull in opposite directions, and resolving that tension is the core of context design. Too little and the agent invents; too much and relevance collapses. The target is the smallest set of context that is still complete for the task, which is why sufficiency can only be judged per task, not as a global setting.

Isolation: keeping conflicting data out

Isolation is the criterion that separates a clean working context from a polluted one, and it is the criterion that breaks first in multi-agent systems. When agents share a context channel, one agent’s intermediate output becomes another’s input, contradictions accumulate, and no single agent has the authority to reconcile them. Sub-agent context isolation, where each agent operates on a scoped slice rather than a shared pool, is the structural answer.

An agent connected to a Wire container sees one container’s structured entries rather than the union of every source it has ever touched, so the isolation criterion holds by construction rather than by prompt discipline. That is the difference between hoping an agent ignores irrelevant state and guaranteeing the state was never in its window to begin with.

Isolation also covers temporal conflicts: a stale fact and its updated replacement sitting in the same window is an isolation failure, and it produces the kind of contradiction that looks like a reasoning bug but is really a context bug.

Economy: structuring information efficiently

Economy is about information density, not deletion: the same facts expressed in fewer tokens. It earns its place as a criterion because token bloat has compounding costs, slower inference, higher bills, and a lower relevance ratio, all at once. The lever is structure, which is why structured context consistently outperforms raw text dumps for the same underlying information.

Economy is best enforced at ingestion rather than query time. Processing documents into compact, pre-structured entries once, up front, means the agent reads dense context on every later query instead of re-parsing raw files each time. This is the same logic behind allocating a token budget per context source and behind context compression: spend tokens where they buy the most decision value, and stop spending them on format and redundancy.

Provenance: tracing every fact to a source

Provenance is the criterion that makes the other four auditable, because an agent that cannot trace where a fact came from cannot weigh it, verify it, or decide whether it is still true. The paper lists provenance as a first-class quality criterion, not an afterthought, and that placement matters: provenance is structural metadata the agent uses at decision time, not a compliance log bolted on afterward. We have argued the same point at length in provenance is a context engineering primitive.

In practice, epistemic provenance means tagging each piece of context with its source, position, recency, and relationships so the agent can reason about reliability the way a careful human would. Without it, every fact in the window has equal apparent authority, which is exactly how a single bad input becomes treated as ground truth.

The criteria are cumulative, not a checklist

The five criteria gain their power from being applied in order, the same way the paper’s maturity model stacks disciplines rather than listing them. Relevance and sufficiency define whether the context is correct, isolation protects that correctness from contamination, and only then do economy and provenance make a correct context efficient and trustworthy. Skip ahead and you optimize the wrong layer: a perfectly economical context window full of irrelevant, unsourced data is worse than a verbose one that gets the basics right.

This ordering is also a debugging tool. When an agent misbehaves, walk the criteria from the top: was the right information present, was enough of it present, was conflicting data kept out, was it dense, was it sourced. The first criterion that fails is usually the real defect, and it is almost never the one teams reach for first, which is model size.

How to apply the five criteria

Treat the criteria as design constraints at ingestion, not as runtime patches. Most of the work happens before the agent ever runs a query: deciding what to retrieve for relevance, ensuring task-complete coverage for sufficiency, scoping per-agent context for isolation, structuring entries for economy, and tagging sources for provenance. Doing this work at query time, document by document on every call, is what makes agents slow, expensive, and brittle.

The design criteria also pair directly with measurement. Once you have built context to satisfy the five, you can score it against operational metrics, and the companion practice of measuring context quality covers correctness, completeness, faithfulness, relevance, and freshness on the context you ship. Design for the five criteria, measure against the metrics, and you close the loop that most teams leave open: they observe their agents without ever evaluating what those agents were given to work with.

Sources: Context Engineering: From Prompts to Corporate Multi-Agent Architecture (arXiv 2603.09619) · Deloitte, State of Generative AI in the Enterprise 2026

Frequently asked questions

How are context design criteria different from context quality metrics?

Design criteria describe what to build into context before it reaches the model: relevance, sufficiency, isolation, economy, and provenance. Quality metrics like correctness, completeness, and faithfulness describe how to score context after the fact. You design for the criteria, then measure against the metrics, and the two are most useful together.

Which of the five context criteria matters most for multi-agent systems?

Isolation. When several agents share a context channel, one agent's intermediate output becomes another's polluting input, and conflicting data accumulates faster than any single agent can reconcile. Sub-agent context isolation, where each agent gets a scoped slice rather than the shared pool, is the structural fix.

Does a larger context window improve any of the five criteria?

Not directly. A bigger window raises the ceiling on sufficiency, but it does nothing for relevance, isolation, economy, or provenance, and it often makes relevance worse by inviting teams to dump more marginally related data. The criteria are about how context is structured, not how much fits.

How do you enforce the economy criterion without dropping important context?

Structure context at ingestion so the agent retrieves compact, pre-processed entries instead of raw documents at query time. Economy is about information density, not deletion: the same facts in fewer tokens. Compression and progressive disclosure preserve the underlying data while shrinking the working set the model actually reads.

Context Engineering Context Compression

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container