Loading…
Loading…
Written by Max Zeshut
Founder at Agentmelt
A feature supported by most major LLM providers that stores the processed representation of a long, repeated prompt prefix (system prompts, tool definitions, large reference documents) so subsequent calls skip re-processing and pay a fraction of the token cost. For high-volume agents with stable system prompts, prompt caching typically cuts inference cost by 50–90% and reduces latency noticeably on the cached portion.
A support agent with a 15,000-token system prompt and knowledge base routes 10,000 tickets per day. Enabling prompt caching drops per-ticket cost from $0.08 to $0.01 without any change to agent behavior.