Loading…
Loading…
The maximum acceptable response time for an AI agent to complete a task, broken down across each step in the pipeline (retrieval, LLM inference, tool calls, post-processing). Voice agents need sub-second latency for natural conversation; support chat agents target 2–5 seconds; background agents (email, research) can take minutes. Understanding your latency budget drives model selection, caching strategy, and architecture decisions.