What Is a Context Window?

Jitpal Kocher · March 12, 2026 · Updated April 22, 2026 · 5 min read

Key takeaway

A context window is the total working memory an AI model can process in a single request, measured in tokens. Advertised sizes are misleading: effective capacity is typically 60-70% of the stated maximum, and accuracy can drop 15-30 percentage points on information buried in the middle of the window. What you put in the window matters more than how big it is.

Every AI model advertises a number: 200K tokens, 1 million tokens, 2 million tokens. These numbers keep climbing, and marketing teams treat each jump like a breakthrough. But what does “context window” actually mean, and why does the number on the box matter less than you think?

The basics

A context window is the total amount of text a language model can process in a single request. Think of it as working memory: everything the model can “see” while generating a response. Once the window is full, older content gets pushed out or truncated.

The unit of measurement is tokens, not words. A token is roughly three-quarters of a word in English, so 1,000 tokens is about 750 words. A 200K token window holds roughly 150,000 words, or about 500 pages of text.

Everything competes for space in this window: your system prompt, the conversation history, any documents retrieved by RAG, tool definitions, and the model’s own output. The “context window size” you see advertised is the total budget for all of it.

Where things stand today

Context windows have grown from 8K tokens in early 2023 to 1-2 million tokens today, a 250x jump in under three years. The table below shows the current landscape across major frontier models:

Model	Context window	Notes
Claude Opus 4.6	1M tokens	Beta; 200K standard
Gemini 3 Pro	1-2M tokens	2M via Vertex AI
GPT-5.2	400K tokens	Up from 128K in GPT-4

These are large numbers. A 1 million token window can hold roughly 7,500 pages of text. In theory, you could load an entire codebase, a full legal discovery set, or years of customer support tickets into a single prompt.

In practice, that’s rarely a good idea.

Why size isn’t everything

Research consistently shows that models don’t use their full context window effectively. The gap between advertised capacity and effective capacity is substantial.

Models degrade before hitting their limit

Elvex’s 2026 benchmarks found that effective capacity is roughly 60-70% of the advertised maximum. A model with a 200K token window typically becomes unreliable around 130K tokens. The drop is often sudden rather than gradual: performance holds steady, then falls off a cliff.

Information in the middle gets lost

Stanford researchers documented the “lost in the middle” effect, showing that models handle information at the beginning and end of their context far better than information in the middle. Their experiments found a 15-20 percentage point accuracy drop for middle-positioned content. Where you place information in the window matters as much as whether it fits.

Simple tasks get harder with more context

Chroma’s research on context rot tested 18 leading models on trivially simple tasks (basic retrieval, text replication, fact extraction) and found accuracy dropped from 95% to 60-70% as input length increased. The tasks didn’t get harder. The only variable was how much text surrounded the answer. More context meant worse results on the same question.

Benchmarks overstate real-world performance

NVIDIA’s RULER benchmark tested models on tasks beyond simple needle-in-a-haystack retrieval and found that most models claiming 32K+ token windows couldn’t effectively handle even 32K tokens on realistic tasks. Only a handful maintained acceptable performance, and even the best (GPT-4 at the time) showed a 15-point degradation at 128K. Passing a needle-in-a-haystack test, where a model finds one isolated fact in padding text, does not mean a model can reason across a full window of real information.

What matters more than window size

What goes into the window matters more than how much fits. The shift in thinking is from “how much fits” to “what should go in,” which is the core of context engineering: designing systems that deliver the right information at the right time.

Selective retrieval. Instead of loading everything into the window, retrieve only what’s relevant to the current query. This is what RAG systems do when implemented well. A focused 5,000-token context often outperforms a sprawling 100,000-token context because the model can attend to all of it effectively.

Structured context. Raw text dumps are harder for models to navigate than organized, structured information. JSON, XML, databases with queryable fields, or purpose-built context containers give models a clearer signal-to-noise ratio than walls of unformatted prose.

Strategic placement. Given the lost-in-the-middle effect, put the most critical information at the beginning or end of the context. This is a free optimization that costs nothing to implement and can meaningfully improve output quality.

External memory. Not everything needs to live in the context window. Preferences, reference documents, and historical data can live in external systems that the model queries on demand. The window stays focused on the current task. (For more on why this matters, see Why does ChatGPT forget everything?)

Takeaways

Treat advertised context windows as a ceiling, not a target. Effective capacity is 60-70% of the stated maximum. Plan accordingly.
Curate what goes into the window. A smaller, focused context reliably outperforms a larger, noisy one. If you’re filling the window, you’re probably including too much.
Design for context quality, not context volume. The organizations getting the best results from AI are spending more time on what enters the window than on finding models with bigger windows.

The context window is a constraint worth understanding, but it’s one piece of a larger puzzle. The more interesting question isn’t “how many tokens can this model hold?” It’s “how do I make sure every token counts?”

References

Frequently asked questions

What is a context window in AI?

A context window is the total amount of text a language model can process in a single request, including the system prompt, conversation history, retrieved documents, tool definitions, and the model's own output. It is measured in tokens, where one token is roughly three-quarters of an English word. Everything the model can 'see' while generating a response lives inside this window.

How big are the context windows of current AI models?

As of 2026, Claude Opus 4.6 supports up to 1M tokens (200K standard), Gemini 3 Pro supports 1-2M tokens, and GPT-5.2 supports 400K tokens. A 1M token window holds roughly 7,500 pages of text. However, effective capacity is typically 60-70% of the advertised maximum before reliability drops off.

Does a bigger context window mean better AI performance?

No. Research from Chroma tested 18 leading models on simple tasks and found accuracy dropped from 95% to 60-70% as input length increased, even when the task stayed the same. NVIDIA's RULER benchmark found that most models claiming 32K+ token windows cannot effectively handle even 32K tokens on realistic tasks. A focused 5,000-token context often outperforms a sprawling 100,000-token one.

What is the 'lost in the middle' problem?

Stanford researchers found that models handle information at the beginning and end of the context far better than information placed in the middle, with a 15-20 percentage point accuracy drop for middle-positioned content. Where you place information in the window matters as much as whether it fits.

How should I work within AI context window limits?

Treat the advertised size as a ceiling, not a target, and plan around roughly 60-70% of it. Curate what enters the window using selective retrieval, use structured formats instead of raw text dumps, place the most critical information at the beginning or end, and move reference material into external systems the model can query on demand.

AI Hallucination Epistemic Provenance

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container