How is model context caching different from prompt caching?

They're the same concept with different vendor names. Anthropic calls it 'prompt caching,' Google calls it 'context caching,' and OpenAI implements it automatically. All work the same way: caching the processed representation of repeated prompt prefixes to speed up and cheapen subsequent requests.

Model Context Caching

Written by Max Zeshut

Founder at Agentmelt

An inference optimization where the processed representation of static prompt content (system instructions, tool definitions, reference documents) is cached between requests so the model skips re-processing it on subsequent calls. Context caching is distinct from response caching—it caches the input processing, not the output. For agents with large, stable system prompts (common in production deployments), context caching reduces per-request latency by 30–60% and cost by 50–90% on the cached portion. Supported by Anthropic (prompt caching), Google (context caching), and OpenAI (automatic caching).

Пример

A support agent has a 12,000-token system prompt with guardrails, tool definitions, and company policies. Without caching, every ticket processes all 12,000 tokens. With context caching, the prompt is processed once and reused for subsequent requests—cutting per-ticket cost from $0.036 to $0.004.

Часто задаваемые вопросы

How is model context caching different from prompt caching?: They're the same concept with different vendor names. Anthropic calls it 'prompt caching,' Google calls it 'context caching,' and OpenAI implements it automatically. All work the same way: caching the processed representation of repeated prompt prefixes to speed up and cheapen subsequent requests.

Связанные ниши

Назад в глоссарий

Loading…