Definition
What is Prompt Caching?
Last updated
A technique that stores computed key-value tensors from a prompt's prefix so they can be reused on subsequent API calls, reducing cost and latency.
AI agents reprocess the same system instructions, tool definitions, and context on every API call. Prompt caching eliminates this redundancy by reusing previously computed representations when the prefix matches exactly. Major providers offer up to 90% discounts on cached input tokens, making it one of the most impactful optimizations for production agent workloads.
Further reading
Articles about Prompt Caching
Does AI token usage scale with knowledge base size?
AI token usage scales with knowledge base size only when the full corpus loads per query. The real variable is selective context delivery, not KB size.
Context budgets: how to allocate tokens for AI agents
A practical guide to context budgets for AI agents. How to allocate tokens across system prompts, tools, retrieval, history, and a buffer in production.
Why your AI costs are a context problem
Token prices fell 280x but enterprise AI spend rose 320%. Poor context architecture drives 60-70% of total AI costs. Here is where the money actually goes.
How prompt caching cuts AI agent costs by 90%
Prompt caching reduces AI agent API costs by up to 90% and latency by 31%. Here's how it works, where it breaks, and how to implement it right.
All terms
View full glossaryPut context into practice
Create your first context container and connect it to your AI tools in minutes.
Create Your First Container