Definition
What is Prompt Caching?
Last updated
A technique that stores computed key-value tensors from a prompt's prefix so they can be reused on subsequent API calls, reducing cost and latency.
AI agents reprocess the same system instructions, tool definitions, and context on every API call. Prompt caching eliminates this redundancy by reusing previously computed representations when the prefix matches exactly. Major providers offer up to 90% discounts on cached input tokens, making it one of the most impactful optimizations for production agent workloads.
Further reading
Articles about Prompt Caching
Claude Fable 5 put a price on context portability
Claude Fable 5 returns refusals as HTTP 200s and retries them on Opus 4.8. The fallback API reveals exactly what agent context survives a mid-task model swap.
Why token cost doesn't scale with knowledge base size
AI token usage scales with knowledge base size only when the full corpus loads per query. The real variable is selective context delivery, not KB size.
Context budgets: how to allocate tokens for AI agents
A practical guide to context budgets for AI agents. How to allocate tokens across system prompts, tools, retrieval, history, and a buffer in production.
Why your AI costs are a context problem
Token prices fell 280x but enterprise AI spend rose 320%. Poor context architecture drives 60-70% of total AI costs. Here is where the money actually goes.
All terms
View full glossaryPut context into practice
Create your first context container and connect it to your AI tools in minutes.
Create Your First Container