Definition

What is Prompt Caching?

Last updated April 2, 2026

A technique that stores computed key-value tensors from a prompt's prefix so they can be reused on subsequent API calls, reducing cost and latency.

AI agents reprocess the same system instructions, tool definitions, and context on every API call. Prompt caching eliminates this redundancy by reusing previously computed representations when the prefix matches exactly. Major providers offer up to 90% discounts on cached input tokens, making it one of the most impactful optimizations for production agent workloads.

Articles about Prompt Caching

Jun 11, 2026

Put context into practice

Create your first context container and connect it to your AI tools in minutes.

Create Your First Container

What is Prompt Caching?

Articles about Prompt Caching

Claude Fable 5 put a price on context portability

Why token cost doesn't scale with knowledge base size

Context budgets: how to allocate tokens for AI agents

Why your AI costs are a context problem

Put context into practice