Is prompt caching the same as response caching?

No. Prompt caching stores the processed input (so the model doesn't re-read your system prompt every call). Response caching stores the final answer so identical questions return without calling the model at all. They're complementary.

Prompt Caching

Written by Max Zeshut

Founder at Agentmelt

A feature supported by most major LLM providers that stores the processed representation of a long, repeated prompt prefix (system prompts, tool definitions, large reference documents) so subsequent calls skip re-processing and pay a fraction of the token cost. For high-volume agents with stable system prompts, prompt caching typically cuts inference cost by 50–90% and reduces latency noticeably on the cached portion.

Example

A support agent with a 15,000-token system prompt and knowledge base routes 10,000 tickets per day. Enabling prompt caching drops per-ticket cost from $0.08 to $0.01 without any change to agent behavior.

Frequently asked questions

Is prompt caching the same as response caching?: No. Prompt caching stores the processed input (so the model doesn't re-read your system prompt every call). Response caching stores the final answer so identical questions return without calling the model at all. They're complementary.

Related niches

Back to glossary

Loading…