Loading…
Loading…
Written by Max Zeshut
Founder at Agentmelt
The maximum number of tokens (input + output) an AI agent is allowed to consume per task, session, or billing period. Token budgets prevent runaway costs from agent loops, overly long conversations, or verbose tool outputs. A well-configured token budget forces efficient prompt design and retrieval—if a support agent has a 4,000-token budget per ticket, it must retrieve only the most relevant KB passages rather than stuffing everything into context.
A support agent with a 5,000-token budget per ticket: 2,000 tokens for the system prompt (cached), 1,500 for retrieved context, 500 for the customer message, and 1,000 for the response. If a ticket exceeds budget, the agent escalates to a human rather than consuming unlimited tokens.