Definition

What is Prompt Caching?

A technique that stores computed key-value tensors from a prompt's prefix so they can be reused on subsequent API calls, reducing cost and latency.

AI agents reprocess the same system instructions, tool definitions, and context on every API call. Prompt caching eliminates this redundancy by reusing previously computed representations when the prefix matches exactly. Major providers offer up to 90% discounts on cached input tokens, making it one of the most impactful optimizations for production agent workloads.

Related concepts

Context Engineering

The practice of deliberately designing, structuring, and managing the information provided to AI models to improve output quality and relevance.

Context Window

The maximum amount of text (measured in tokens) that a language model can process in a single inference call.

AI Agent

An autonomous software program that uses a large language model to plan and execute multi-step tasks.

Context Compression

The practice of reducing token count in an AI agent's context window while preserving the information needed to complete tasks.

Put context into practice

Create your first context container and connect it to your AI tools in minutes.

Create Your First Container