Context Engineering MCP (Model Context Protocol) Structured Context AI Agent Epistemic Provenance

Provenance is a context engineering primitive, not a trust score

Jitpal Kocher · April 22, 2026 · 10 min read

Key takeaway

Epistemic provenance for AI agents is the typed metadata attached to each retrieved piece of context: where it came from, where it sits inside that source, when it was ingested, and how it connects to other context. It is not a trust score or a compliance audit log. It is structural grounding consumed at inference time, and it is what lets an agent plan the next retrieval, cite without fabricating, and stitch answers across sources. Most retrieval APIs return an id, a score, and a content blob, and leave the agent to reconstruct the rest, which is why multi-step reasoning breaks down.

A retrieval result that is just an id, a score, and a content blob is a floating string. The agent calling the API has to reconstruct, from the text alone, where the chunk came from, how long its source is, whether it is the first or the last paragraph, whether anything next to it would help, and how to cite it without fabricating. Most retrieval systems ship exactly this shape and wonder why their agents struggle with multi-step reasoning.

Provenance is the fix. Not provenance as a compliance feature, and not provenance as a trust verdict. Provenance as structured metadata, typed and consumed by the agent at the moment it is deciding what to do next. Context engineering treats this as part of the tool contract, not an operational afterthought. This post argues that epistemic provenance belongs inside every retrieval result the way a type signature belongs on every function, and walks through what it looks like, what jobs it does for the agent, and why trust scoring is the wrong abstraction to reach for.

What epistemic provenance actually is

Epistemic provenance is structural metadata attached to each piece of retrieved context that tells an agent where the content came from and how it connects to the rest of the corpus. The four load-bearing fields are source identity, position inside the source, ingestion time, and typed relationship edges to other entries. Everything else, including section headers, summaries, tags, and navigable counts, builds on those four. The word “epistemic” is doing real work here: this is metadata about how the agent knows what it knows, not metadata about how the data was processed internally.

Three things epistemic provenance is not. It is not a trust score. It is not an audit log. And it is not an editorial judgment about content quality. Each of those is a different artifact with a different consumer, and collapsing them into one field is where most early provenance attempts go wrong. The agent wants structure. Humans reviewing the system after the fact want audit. Compliance wants lineage. These can share a storage layer without sharing a contract.

The practical test is whether the field changes the agent’s next action. Source id and position change it: the agent cites differently. Navigable counts change it: the agent chooses traversal over re-search. Typed edges change it: the agent weighs contradictory evidence differently. A free-text “quality note” does not change it, because the agent has to rewrite its own internal judgment every time to decide whether to trust the note.

The three jobs provenance does at inference time

Epistemic provenance does three distinct jobs for a retrieval-calling agent, and they compound. Miss one and the others still work; miss all three and the agent is flying blind between calls.

The first job is planning. When a match comes back with hasSiblings: 32 and a relationshipTypes map showing four elaborates edges and one corroborates edge, the agent can decide, without another round-trip, whether to traverse adjacent chunks, follow the relationship graph, or re-issue a search. Without that signal, it is guessing. Anthropic’s engineering team made a version of this point in their 2025 post on writing tools for AI agents: tool output shape influences agent trajectory at least as much as tool descriptions. Navigable metadata is exactly the kind of shape that changes trajectory.

The second job is grounding. An agent that knows “this chunk is index 15 of 33 in file example.txt, ingested 2026-04-15” can cite at the right granularity. No fabricated page numbers, no invented section titles, no confident claims about what the “rest of the document” says. In a retrieval setting, a large share of what gets labeled as hallucination is actually attribution failure, where the model states something plausible that the retrieved evidence does not actually support. Position-level provenance closes the specific loop between claim and citation.

The third job is stitching. Cross-document questions are where single-shot retrieval collapses, because the answer is distributed and the agent has to reconcile chunks that point in different directions. Typed relationship edges, with direction, let the agent distinguish “chunk A elaborates chunk B” from “chunk A contradicts chunk B” from “chunk A supersedes chunk B.” That distinction is the difference between a synthesis that acknowledges disagreement and a synthesis that flattens it into a confident wrong answer. RAG pipelines that dump related content without type force the agent into the flattening path by default.

What the shape looks like

Here is the provenance object Wire attaches to every result from wire_search and wire_navigate, copied from the live contract:

{
  "id": "4d8a4ad4-66da-4c20-9366-21a378357582",
  "score": 0.046,
  "content": "...",
  "provenance": {
    "source": "file:example.txt",
    "sourceFileId": "BLjdInPD6UvbhcFZ",
    "chunkIndex": 15,
    "totalChunks": 33,
    "ingestedAt": "2026-04-15T17:42:37Z",
    "tags": ["chunk"],
    "fileName": "example.txt",
    "sectionHeader": "...",
    "chunkSummary": "..."
  },
  "_meta": {
    "wire": {
      "navigate": {
        "hasSiblings": 32,
        "relationshipTypes": {
          "elaborates": 4,
          "corroborates": 1
        }
      }
    }
  }
}

A few design choices deserve flagging, because they are the part most retrieval systems get wrong. hasSiblings is a count, not a boolean. The difference matters: a boolean says traversal is possible, a count tells the agent how big the neighborhood is, which it needs to calibrate how far to walk. Relationship types are counts-per-type rather than a flat list of ids, so the agent can pick the edge type that fits the question before paying to traverse. Every edge the agent does walk carries an edgeType and an edgeDirection, so it can tell “X contradicts my chunk” from “my chunk contradicts X.” That last distinction is small in prose and load-bearing in reasoning.

Why trust scores are the wrong abstraction

The common first instinct when adding provenance is to attach a trust score. Resist it. A trust score presumes the server knows whether this content is relevant and reliable for the agent’s current task, and it does not. Trust is contextual: the same chunk is authoritative for one question and contradicted by newer evidence for another. Collapsing that to a single number at index time either hardcodes the wrong priors or forces you to re-score on every query, at which point the score is no more than a cached recomputation of things the agent could have decided with typed metadata.

There is also a security argument. A trust score is itself an attack surface. If an attacker can influence how the score is computed, or can seed the index with content that scores high, they now control what the agent treats as authoritative. Context poisoning works this way: the attack does not need to rewrite the model, it just needs to push a crafted chunk past whatever score the retrieval layer is filtering on. Neutral structural metadata gives the agent the raw signals (source, recency, relationship type) and lets it make trust judgments informed by the task, which is a harder surface to poison because it is not a single number to game.

Wire’s framing on the shipped product page is explicit: the container tells the agent where things came from, not what to believe. That separation keeps the server out of the editorial business and keeps the agent in charge of trust, which is where task context lives.

Benchmark signal

We measured what this shape costs and what it earns on the same 64-question fixture we use for retrieval work. The full numbers are on the agent efficiency page. The short version: on a Gemini 3 Flash agent with up to seven retrieval turns per question, moving from a retrieval surface that returned ids, scores, and content to one that added typed provenance and a navigable neighborhood raised correctness from 4.47 to 4.78 on a 1 to 5 scale, dropped couldn’t-answer from 5 of 64 to 2 of 64, and cut average token spend 20 percent. The mechanism write-up breaks apart what came from splitting tools versus what came from provenance; the provenance contribution shows up most clearly in cross-document questions, where typed edges gave the agent a way to traverse rather than re-search.

Faithfulness sat at 5.00 in both runs, which we read as a property of the harness (the prompt forced the agent to answer only from retrieved content) rather than an absence of signal. Where provenance likely earns faithfulness wins is on fixtures that stress contradiction, recency, and attribution, which we are still building.

How to adopt it

If you own a retrieval surface that an agent calls, four steps cover most of the value.

First, emit source and position on every result. A chunk without a position tells the agent “you have some text from somewhere.” That is enough to answer a single question badly and not enough to plan another call. Index position (integer chunk index plus total) is cheap to compute, stable across reindexing, and sufficient for most traversal patterns.

Second, emit ingestion time. Agents handle stale content much better when they can see it is stale. Without ingestion time, recency becomes an implicit assumption buried in the content, which the model will quietly ignore when it gets inconvenient.

Third, emit counts where you would be tempted to emit booleans. hasSiblings: 32 lets the agent decide whether to pull three siblings or all of them. hasSiblings: true forces a guess.

Fourth, type your edges. If you surface related entries at all, tell the agent the relationship type and direction. elaborates, corroborates, contradicts, and supersedes cover most of what a reasoning agent needs to distinguish. Flat related: [ids] is worse than no related list at all, because it invites the agent to treat every pointer the same way.

These four together change the tool contract from “retrieve text” to “retrieve text plus everything the agent needs to decide what to do with it.” That shift is what makes retrieval feel like a context pipeline instead of a keyword lookup.

Closing

Epistemic provenance is not a nice-to-have layered on top of a retrieval system; it is part of the contract. The retrieval layer that returns only ids, scores, and content is asking the agent to do work the server is in the best position to do cheaply and correctly. Treat provenance as first-class structure: typed, navigable, consumed at inference time. Leave trust to the agent. Ship the shape, not the verdict.

Sources: Wire: More tools, fewer calls, restructuring agentic retrieval · Anthropic: Writing tools for AI agents · Wire: One job per tool

Frequently asked questions

What should a retrieval result return so an AI agent can ground its answer?

At minimum, a source identifier, a position inside that source (chunk index or offset), and an ingestion timestamp. Typed relationship edges to other entries and counts of navigable neighbors are upgrades that pay off as soon as the agent is doing more than single-shot retrieval.

Why is a trust score a bad addition to a retrieval API?

A trust score collapses a judgment that depends on the agent's current task, which the server does not know. It also becomes an attack surface: whoever can influence the score controls the agent. Neutral structural metadata preserves the agent's ability to reason about trust in context.

How is this different from data lineage in a data warehouse?

Data lineage is a record of how values moved through a pipeline, consumed by humans for compliance and debugging. Epistemic provenance is consumed by an AI agent at inference time and is designed to change its next action. Same underlying idea, different consumer, different shape.

Do typed relationship edges matter more than returning related ids?

Yes. 'Related' without an edge type forces the agent to guess the relationship, which is where cross-document reasoning tends to break. Knowing that chunk B elaborates chunk A, while chunk C contradicts it, lets the agent frame an answer that acknowledges both sides instead of averaging them.

Will provenance fix hallucinations in my RAG system?

It will not fix hallucinations from model overconfidence, but it will reduce attribution failures, which are a significant share of what gets labeled as hallucination. Position-level provenance lets the agent quote from a specific location, and typed edges let it distinguish supporting evidence from contradictory evidence before committing.

AI Agent Context Engineering

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container