Definition
What is a Context Window?
Last updated
The maximum amount of text (measured in tokens) that a language model can process in a single inference call.
Think of it as working memory: everything the model can see while generating a response. Once the window is full, older content gets pushed out or truncated. System prompt, conversation history, retrieved documents, tool definitions, and the model's own output all compete for the same budget.
- Measured in tokens, not words (a token is roughly three-quarters of a word in English).
- Everything competes for space: system prompt, history, retrieved docs, tool outputs, model response.
- Effective capacity is usually 60-70% of the advertised number, not 100%.
- Information placement matters: content in the middle of the window gets less attention than the edges.
- A focused small context often beats a sprawling large one on the same task.
How context windows work
A context window is the total amount of text a language model can process in a single request. The unit is tokens, not words. Everything the model sees competes for this budget:
- the system prompt
- conversation history
- retrieved documents (RAG results)
- tool definitions and their returned outputs
- long-term memory snippets
- the model’s own response as it generates
Once the window fills, older content must be truncated, summarized, or dropped. That’s why long agent sessions can “forget” early instructions and why tool-heavy workflows can run out of room mid-task.
Modern windows have grown dramatically. Claude Opus 4.6 offers a 1M token beta. Gemini 3 Pro reaches 1-2M tokens via Vertex AI. GPT-5.2 sits at 400K. A 1M token window holds roughly 7,500 pages of text.
Why window size isn’t the real bottleneck
Research consistently shows that models don’t use their full context window effectively.
- Models degrade before the limit. Elvex’s 2026 benchmarks found effective capacity is roughly 60-70% of advertised maximum. The drop is often sudden rather than gradual.
- Middle content gets lost. Stanford’s “lost in the middle” study showed a 15-20 percentage point accuracy drop for information placed mid-context versus at the edges.
- Simple tasks get harder with more context. Chroma’s context rot research tested 18 leading models and found accuracy dropping from 95% to 60-70% on trivial retrieval tasks purely as input length grew.
- Benchmarks overstate performance. NVIDIA’s RULER benchmark showed most models claiming 32K+ windows couldn’t effectively handle 32K on realistic tasks.
The shift in production thinking is from “how much fits” to “what should go in.” A focused 5,000-token context often outperforms a sprawling 100,000-token context because the model can attend to all of it.
Common misconceptions about context windows
- “Bigger is always better.” Bigger raises the ceiling on what fits, not on what the model attends to. Cost scales with every token included, whether the model uses it or not.
- “Passing the needle-in-a-haystack test means the window works.” Needle tests are easy. Realistic multi-step reasoning over full windows is much harder and is where most models fall apart.
- “The order of information doesn’t matter.” It matters a lot. Place the most important content at the start or end of the input; don’t bury critical instructions or evidence in the middle.
- “1M tokens means I can load my whole codebase.” You can load it. The model won’t reason over it well. Selective retrieval still wins.
Context windows and Wire
Wire is designed around the reality that window size is a constraint, not a solution. Files uploaded to a container are chunked, embedded, and exposed through wire_search, so agents pull only the relevant passages into their context rather than loading whole documents. wire_explore returns structured summaries that keep tool outputs compact. The goal is to keep your agent’s window populated with the information that matters for the current step, not with every file you’ve ever given it.
FAQ
Frequently asked questions
Common questions about Context Window.
What is a token?
Do bigger context windows make RAG obsolete?
What is the 'lost in the middle' effect?
Why does the advertised context window not match real performance?
How much of my context window should I actually use?
Further reading
Articles about Context Window
MCP authorization decides what context agents see
MCP authorization became a context control plane in 2026. RFC 8707 token scoping decides which sources an agent can ever pull into its own context window.
MCP Tasks: long-running work as context offloading
MCP Tasks let a server return a durable handle instead of a blocking result, keeping a long-running tool call's interim state off the agent's context window.
Sub-agent context isolation: the fix for context rot
Sub-agent context isolation gives each agent its own scoped window, stopping the context rot that kills multi-agent runs. Here's the pattern and its limits.
Claude Opus 4.8 hallucinates less by answering less
Claude Opus 4.8 tops a hallucination benchmark without getting more accurate. It learned to abstain. Why retrieval honesty is a context engineering win.
All terms
View full glossaryPut context into practice
Create your first context container and connect it to your AI tools in minutes.
Create Your First Container