Definition
What is Prompt Caching?
A technique that stores computed key-value tensors from a prompt's prefix so they can be reused on subsequent API calls, reducing cost and latency.
AI agents reprocess the same system instructions, tool definitions, and context on every API call. Prompt caching eliminates this redundancy by reusing previously computed representations when the prefix matches exactly. Major providers offer up to 90% discounts on cached input tokens, making it one of the most impactful optimizations for production agent workloads.
The practice of deliberately designing, structuring, and managing the information provided to AI models to improve output quality and relevance.
The maximum amount of text (measured in tokens) that a language model can process in a single inference call.
An autonomous software program that uses a large language model to plan and execute multi-step tasks.
The practice of reducing token count in an AI agent's context window while preserving the information needed to complete tasks.
All terms
View full glossaryPut context into practice
Create your first context container and connect it to your AI tools in minutes.
Create Your First Container