Back to Blog

Structured Context vs Raw Text for AI

JP · · 6 min read

A team at ETH Zurich recently tested whether giving AI coding agents a context file about the repository they’re working in improves their performance. The result was counterintuitive: LLM-generated context files reduced task success rate by an average of 3% compared to giving the agent no context file at all. They also increased inference costs by over 20%, because agents took more steps to complete tasks while processing the extra noise.

Human-written context files fared better, improving success rate by about 4%. But the gap between “AI-generated context that hurts” and “human-written context that helps” isn’t about who wrote it. It’s about structure. The AI-generated files were verbose, generic, and full of information the agent could infer on its own. The human-written files were concise, specific, and focused on non-obvious details like custom build commands and project conventions.

The lesson generalizes far beyond coding agents. How you structure context matters more than how much context you provide.

The raw text problem

The default approach to giving AI systems context is to dump in everything that seems relevant. Paste the full document. Concatenate the conversation history. Retrieve 20 chunks from a vector database and stuff them into the prompt. a16z recently described this as the central bottleneck in enterprise AI: the gap between an organization’s messy data and the actionable context that agents actually need.

This approach fails for well-documented reasons. As context length grows, model accuracy degrades. Research from Chroma shows that even trivially simple tasks drop from 95% accuracy to 60-70% as input length increases, a phenomenon called context rot. (For a deeper look at the mechanism, see Context Rot: Why AI Performance Degrades With More Information.)

Augment Code put it well: your agent’s context is a junk drawer. Every irrelevant paragraph competes for the model’s finite attention budget, diluting the signal from the information that actually matters. The result is more hallucinations, more missed facts, and more wasted tokens.

Why structure matters

Three mechanisms explain why structured context outperforms raw text.

It reduces token waste

Raw documents contain formatting artifacts, boilerplate, repetitive headers, and filler text that consume tokens without adding information. A 50-page PDF might contain 10 pages of actual content relevant to the query. Structured representations strip the noise. Returning typed records like {ticket_id, component, status, summary} instead of the full ticket thread gives the model exactly what it needs in a fraction of the tokens.

It exploits attention patterns

Research on prompt formatting found that format choice alone can swing LLM accuracy by up to 40% on code translation tasks. Models aren’t format-agnostic. Clear section headers, consistent delimiters, and hierarchical organization help the model allocate attention to the right places. Anthropic’s context engineering guide recommends organizing context into distinct sections with XML tags or Markdown headers specifically because it improves model behavior.

It enables selective retrieval

Structured context is queryable. When information is organized into typed fields with metadata, you can retrieve precisely the 3-5 records relevant to the current query rather than the 20 loosely related chunks that naive RAG returns. Less context, higher relevance, better output.

What doesn’t work

AI-generated context files

The ETH Zurich study tested this directly. Across 138 real-world Python tasks and four different coding agents (Claude 3.5 Sonnet, GPT-5.2, GPT-5.1 mini, and Qwen Code), LLM-generated AGENTS.md files consistently hurt performance. The generated files restated information the agent could infer from the codebase, adding tokens without adding knowledge.

Naively converting everything to JSON

Wrapping raw text in JSON syntax doesn’t make it structured. If your JSON object contains a single "content" field with a 5,000-word document pasted inside, you’ve added token overhead without improving the model’s ability to find relevant information. Structure means organizing information into meaningful fields that the model can reason over, not adding brackets around prose.

Over-structuring

There’s a point of diminishing returns. The ACE framework (ICLR 2026) found that representing context as a collection of structured, itemized bullets with metadata (unique IDs, helpfulness counters) outperformed monolithic prompts and matched top-ranked production agents using smaller open-source models. But each bullet was a small, self-contained unit: one strategy, one concept, one failure mode. The structure served retrieval and relevance, not complexity for its own sake.

What works

The research converges on a few principles.

Typed records over prose. When you can represent information as small, structured records with named fields, do it. A customer record with {name, plan, status, last_contact} is more useful to an agent than a paragraph describing the same information. The model can reason over fields directly instead of parsing natural language.

Metadata for relevance. The ACE framework attaches helpfulness scores to each context item, so the system can prioritize what’s been useful before. Even simple metadata like source, recency, and category helps retrieval systems select the right context for each query.

Minimal context, maximum signal. Anthropic’s guide emphasizes striving for “the minimal set of information that fully outlines your expected behavior.” The ETH Zurich researchers reached the same conclusion: limit context files to non-inferable details. If the model can figure it out from the input, don’t tell it twice.

Process at upload time, not query time. Rather than structuring context on every query, do the transformation work once when documents enter the system. Extract entities, categorize content, and build structured representations upfront. At query time, return pre-processed records instead of raw text. Tools like Wire take this approach, transforming files into structured context at upload time so agents receive clean, typed data on every query.

Practical checklist

If you’re building systems that deliver context to AI agents:

  1. Audit your context payload. Pull the actual text being sent to the model for real queries. How much of it is signal vs. noise?
  2. Convert documents to typed records where possible. Named fields beat paragraphs for factual content.
  3. Cap your retrieval. Return 3-5 highly relevant items, not 20 loosely related ones. Less context with higher relevance beats more context with lower relevance.
  4. Strip what’s inferable. If the model can determine something from the primary input, don’t repeat it in the context. The ETH Zurich data shows this actively hurts.
  5. Measure the difference. Run the same queries with raw text and structured context. Track accuracy, hallucination rate, and token usage. The delta is usually significant.

The trend in context engineering is clear: the teams getting the best results from AI agents aren’t the ones with the most context. They’re the ones with the most structured context.

References

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Get Started