Back to Blog

7 context engineering techniques for production

JP · · 7 min read

Most AI systems fail in production for the same reason: the model had the wrong information at the wrong time. Not a bad prompt, not a weak model. Bad context.

Context engineering is the discipline of fixing that. But the term is broad enough to be unhelpful on its own. What does it actually look like in a production system? Below are seven techniques that teams are using today, each addressing a specific failure mode.

1. What is selective retrieval?

Selective retrieval is the practice of filtering and routing which information reaches the model, rather than retrieving everything that’s semantically similar.

Basic RAG retrieves the top-k chunks closest to a query embedding and stuffs them into the prompt. This works for simple question-answer tasks. It breaks down when queries require different types of context depending on intent, when the top-k results include near-duplicates, or when irrelevant but semantically similar content dilutes the signal.

Selective retrieval adds a routing layer. Before retrieval, the system classifies the query and determines what category of information is needed. A customer support agent receiving “how do I cancel?” needs billing docs, not product release notes, even if both mention “account.” Teams implement this with metadata filters, query classifiers, or dedicated routing agents that decide what to retrieve before retrieval happens.

Anthropic’s context engineering guide recommends this pattern explicitly: retrieve based on task type, not just semantic similarity.

2. What is context compression?

Context compression reduces the token count of context while preserving the information the model needs.

Larger context windows haven’t solved the attention problem. Chroma’s research on context rot showed that model accuracy drops from 95% to 60-70% as input length grows, even on simple tasks. Compression addresses this directly.

Three approaches work in practice. Extractive compression pulls out key sentences or entities from longer documents. Abstractive compression uses a smaller model to summarize passages before they enter the main model’s context. Format compression converts prose into structured representations: typed records, tables, or key-value pairs that convey the same information in fewer tokens. ETH Zurich found that concise, structured context files improved agent success rates by 4%, while verbose ones hurt by 3%.

The key tradeoff: compression is lossy. Validate that compressed context still contains what the model needs for your specific use case.

3. What is layered context architecture?

Layered context architecture organizes information into tiers that change at different speeds, so each layer can be managed independently.

Production systems typically need three layers. The persistent layer holds information that rarely changes: system instructions, domain knowledge, role definitions, and behavioral constraints. The session layer holds information specific to the current conversation or workflow: recent turns, user preferences, accumulated state. The task layer holds information needed for the immediate inference step: retrieved documents, tool outputs, and the current query.

LlamaIndex describes this as writing context (saving outside the window), selecting context (pulling in), compressing context (reducing tokens), and isolating context (splitting up). The layered pattern makes each of those operations cleaner because you know which tier you’re operating on.

Without layers, every change to the context risks disrupting something else. With layers, you can refresh retrieved knowledge without losing conversation history, or reset a task context without wiping session state.

4. What is context isolation?

Context isolation restricts which information is visible to each agent or task, preventing leakage across boundaries.

In multi-agent systems, a common failure mode is context bleed: one agent’s context contaminates another’s reasoning. A planning agent that sees raw tool outputs meant for an execution agent may get confused. A code review agent with access to the full codebase context may lose focus on the specific diff it should evaluate.

Isolation means each agent or task gets a scoped context window containing only what it needs. This is both a reliability technique (less noise means better reasoning) and a security practice. As we covered in AI Agents Have Too Much Access, over-permissioned agents with access to everything are a growing risk. Context isolation is the enforcement mechanism.

In practice, teams implement this with separate system prompts per agent, scoped retrieval (each agent queries only its designated data sources), and explicit context boundaries in orchestration frameworks.

5. What is AI memory management?

Memory management is the practice of persisting relevant information across sessions without bloating the context window over time.

LLMs have no native memory. Every conversation starts from zero unless the system explicitly carries information forward. But naive approaches, like appending every previous interaction to the context, scale poorly. After a few sessions, the accumulated history crowds out the information that matters for the current task.

Effective memory systems are selective. They extract and store key facts, decisions, and preferences rather than raw conversation logs. At the start of each session, they load only the memory items relevant to the current query. LangChain’s research identifies memory as one of the core context engineering primitives, alongside retrieval and tool use.

The implementation pattern that works: write a summary of each session’s key outcomes to a persistent store, tag entries with metadata (topic, recency, importance), and retrieve selectively at session start using the same routing principles from technique 1.

6. What is structured context formatting?

Structured context formatting delivers information to models as organized, typed records rather than unformatted text.

This may be the highest-leverage technique on this list. Research on prompt formatting found that format choice alone can swing LLM accuracy by up to 40%. Structured context with named fields, consistent delimiters, and clear hierarchies lets models allocate attention to the right places instead of parsing prose.

The practical version: convert customer records to {name, plan, status, issue} rather than paragraphs. Use section headers and XML tags to delineate context boundaries. Return typed data from tools rather than natural language descriptions. We covered this in depth in Structured Context vs Raw Text for AI, but the short version is that less text with more structure consistently beats more text with less structure.

7. What is context validation?

Context validation detects stale, conflicting, or missing context before it reaches the model.

Most production failures aren’t dramatic. They’re quiet: the model confidently uses outdated pricing, contradicts itself because two retrieved documents disagree, or answers without context it should have had. Validation catches these before they reach the user.

Three checks matter. Freshness validation ensures retrieved context isn’t past its useful life, particularly important for fast-changing domains like pricing, inventory, or policy documents. Consistency validation detects contradictions between context items, flagging when two sources disagree on the same fact. Completeness validation checks whether the context contains what the model needs for the query type, routing to human review or a fallback response if critical information is missing.

Redis’s best practices guide emphasizes this as the most overlooked step: “bad data in, bad answers out.” Validation is the quality gate that makes the other six techniques reliable.

Choosing the right techniques

Not every system needs all seven. Here’s a quick reference:

TechniqueBest forComplexityImpact on accuracy
Selective retrievalMulti-domain knowledge basesMediumHigh
Context compressionLong documents, high token costsMediumMedium-high
Layered architectureMulti-turn agents, workflowsHighHigh
Context isolationMulti-agent systemsMediumMedium
Memory managementLong-running user sessionsHighMedium
Structured formattingAny system delivering contextLowHigh
Context validationProduction systems with changing dataMediumHigh

If you’re starting from scratch, structured formatting and selective retrieval give the best return for the least effort. If you’re debugging production failures, add validation. If you’re building agents that operate across sessions, invest in memory and layered architecture.

Context engineering platforms like Wire handle several of these techniques automatically: structuring context at upload time, scoping retrieval per query, and validating freshness. But the principles apply regardless of tooling. The teams getting the best results from AI aren’t using better models. They’re engineering better context.

References

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Get Started