Latency Budget

The maximum acceptable response time for an AI agent to complete a task, broken down across each step in the pipeline (retrieval, LLM inference, tool calls, post-processing). Voice agents need sub-second latency for natural conversation; support chat agents target 2–5 seconds; background agents (email, research) can take minutes. Understanding your latency budget drives model selection, caching strategy, and architecture decisions.

Related niches

AI Voice Agent
AI Support Agent
AI Coding Agent

Back to glossary

Loading…