Articles about Prompt Caching

5 articles from the Wire blog, sorted newest first. Return to the Prompt Caching definition for context.

Jun 11, 2026

Claude Fable 5 put a price on context portability

Claude Fable 5 returns refusals as HTTP 200s and retries them on Opus 4.8. The fallback API reveals exactly what agent context survives a mid-task model swap.

Apr 21, 2026

Why token cost doesn't scale with knowledge base size

AI token usage scales with knowledge base size only when the full corpus loads per query. The real variable is selective context delivery, not KB size.

Apr 13, 2026

Context budgets: how to allocate tokens for AI agents

A practical guide to context budgets for AI agents. How to allocate tokens across system prompts, tools, retrieval, history, and a buffer in production.

Apr 9, 2026

Why your AI costs are a context problem

Token prices fell 280x but enterprise AI spend rose 320%. Poor context architecture drives 60-70% of total AI costs. Here is where the money actually goes.

Apr 2, 2026

How prompt caching cuts AI agent costs by 90%

Prompt caching reduces AI agent API costs by up to 90% and latency by 31%. Here's how it works, where it breaks, and how to implement it right.

Put context into practice

Create your first context container and connect it to your AI tools in minutes.

Create Your First Container