You would think that giving an AI more information would lead to better results. More context means more knowledge to draw from, right?
The reality is counterintuitive. Research from Chroma shows that AI models drop from 95% accuracy to 60-70% accuracy as input length increases, even when the task remains trivially simple. This phenomenon has a name: context rot.
Even with Gemini’s 2 million token window or Llama 4’s unprecedented 10 million token capacity, more isn’t always better.
What Is Context Rot?
Context rot describes the systematic degradation of AI performance as input context length increases. The key insight is that this happens even when the underlying task doesn’t get harder.
Think of it like attention stretched thin. Transformer models let every token “attend” to every other token, but there’s a limited attention budget. As the context grows, that attention gets diluted across more and more tokens.
Chroma’s research team tested 18 leading models at the time of the study, including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3. They found:
- Near-perfect performance on short inputs: Models achieved 95%+ accuracy on simple tasks with small contexts
- Significant degradation on longer inputs: The same tasks dropped to 60-70% accuracy as context length increased
- Universal across architectures: This wasn’t specific to one model family; all tested models exhibited the pattern
The tasks themselves were simple: basic retrieval, text replication, fact extraction. The only variable was input length.
The “Lost in the Middle” Problem
Stanford researchers published foundational work on a related phenomenon called the “lost in the middle” effect. When relevant information is buried in the middle of a long context, models struggle to find and use it.
Their experiments revealed a characteristic U-shaped performance curve:
- Information at the start: 70-75% accuracy
- Information at positions 8-12 (middle): 55-60% accuracy
- Information at the end: 70-75% accuracy
That’s a 15-20 percentage point drop based purely on where information appears, not how relevant or well-written it is.
The effect compounds for tasks requiring reasoning across multiple pieces of information. Multi-hop questions, where the model needs to chain together 2+ facts, show even steeper degradation.
This explains a common frustration: you tell your AI something important early in a conversation, but 50 messages later it seems to have forgotten. It probably didn’t forget in the traditional sense. The information is still “there” in the context, but the model’s attention is now stretched across so much text that earlier content receives minimal weight.
Why Bigger Context Windows Don’t Solve This
Current context windows have grown dramatically:
- Claude Opus 4.5: 200K tokens (1M in beta)
- GPT-5.2: 400K tokens
- Gemini 3 Pro: 1-2M tokens
These numbers are impressive, but they don’t address the underlying problem. Raw context length may be a poor proxy for actual capability. A model with a 1M token window that suffers from severe context rot might be less useful than a model with a 10,000 token window that maintains consistent performance.
The popular “Needle in a Haystack” benchmark, where models find a single fact buried in irrelevant text, is somewhat misleading. Finding one isolated fact is different from reasoning over interconnected information scattered throughout a long document.
The real bottleneck isn’t how much text you can stuff into the window. It’s how effectively the model can allocate attention across that text.
What Actually Works
Understanding the problem suggests several practical solutions:
Chunking and Retrieval (RAG)
Rather than loading everything into context at once, retrieval-augmented generation (RAG) systems only fetch relevant chunks when needed. This keeps the active context small and focused, avoiding the attention dilution problem.
That said, RAG is not a silver bullet. The quality varies dramatically based on implementation: how you chunk documents, which embedding model you use, how you handle queries that span multiple chunks, and whether your retrieval actually surfaces the right information. A poorly configured RAG system can make things worse by retrieving irrelevant content that further dilutes the signal.
Structured Context
Raw text dumps are hard for models to navigate. Structured, organized information with clear hierarchies helps models find what they need. Think JSON, XML, or databases with queryable fields rather than walls of prose.
External Systems
Instead of holding everything in the context window, let the model query external systems for specific information. Give AI tools access to databases, APIs, or knowledge bases they can search as needed. The context window becomes a working space for the current task, not a warehouse for everything the model might need.
Practical Takeaways
If you work with AI systems regularly, a few habits can help mitigate context rot:
- Put important information at the start or end of prompts, not buried in the middle
- Break long documents into retrievable chunks rather than dumping everything at once
- Don’t assume your AI “remembers” something from 50 messages ago; re-state critical facts when needed
- Use tools that manage context intelligently, organizing and structuring information rather than just storing it
Context containers, like what Wire creates, take this approach: transforming raw documents into structured, AI-optimized context that agents can query efficiently. But the principle applies regardless of tooling. Better context architecture beats bigger context windows.