Loading…
Loading…
Ключевые термины об ИИ-агентах. Кликните по термину, чтобы открыть определение и связанные ниши (полные определения — на английском).
Всего: 235 терминов.
An autonomous software system that performs tasks on your behalf—researching, communicating, analyzing, or executing workflows—using large language models and integrations. Unlike simple chatbots, agents operate 24/7 and can complete multi-step tasks without constant human input.
Setup and configuration without writing code. You connect tools (CRM, help desk, etc.) and configure behavior in the product's UI.
Resolving customer issues before a support ticket is created. Tools answer from your knowledge base in chat or email so fewer issues reach human agents. Deflection rate (deflected / total inquiries) is the core ROI metric for AI support agents.
A sales role focused on outbound prospecting, lead qualification, and booking meetings for account executives. Automation can augment SDR capacity by handling research, outreach, and first-touch at scale—letting human SDRs focus on live conversations and objection handling.
An AI assistant that works alongside you in a tool (e.g. GitHub Copilot in the IDE). Copilots suggest and complete; they typically don't run fully autonomously. 'Agent' often implies more autonomy and multi-step execution. The distinction matters for pricing, ROI measurement, and deployment: copilots augment individual productivity; agents replace whole workflows.
A type of AI model trained on vast amounts of text to understand and generate human-like language. LLMs power chatbots, coding assistants, and AI agents. Examples include GPT-4, Claude, and Llama. They enable agents to reason, summarize, and act on natural-language instructions.
A technique that combines retrieval (fetching relevant documents from a knowledge base or database) with LLM generation. The model uses retrieved context to produce accurate, cited answers. RAG powers support agents that answer from your KB and legal agents that cite precedents.
The text or instructions you give to an AI model or agent. Prompts define the task, tone, and context. Well-written prompts improve output quality; many no-code tools let you edit prompts in the UI without coding.
Software that stores contacts, deals, and interactions. Sales and support AI agents integrate with CRMs (Salesforce, HubSpot, Pipedrive, Close) to read and write activities, update leads, and keep records in sync—eliminating the 30–45 minutes per day reps typically spend on manual CRM entry.
A collection of articles, FAQs, or documentation that an AI agent can search and cite. Support and legal agents use knowledge bases to answer questions accurately and deflect tickets. Often synced from your help desk or wiki.
Software that manages job postings, applications, and hiring workflows. AI HR agents integrate with ATS systems (Greenhouse, Lever, Workday, Ashby) to screen resumes, schedule interviews, and update candidate status—augmenting recruiters rather than replacing the ATS. Modern ATS integrations let AI score every resume against the job rubric and auto-route top candidates to the recruiter's review queue.
Coordinating multiple steps, tools, or systems in a workflow. AI agents orchestrate by triggering actions across CRM, email, calendar, and knowledge bases. Marketing orchestration might draft content, schedule posts, and report performance in one flow.
Training an existing AI model on your own data to improve performance on specific tasks or style. Less common for no-code agents, which typically use prompts and RAG; fine-tuning is more relevant for custom coding or highly specialized domains.
Running tasks or workflows without manual steps. AI agents automate by executing sequences (e.g. research, email, book meeting) based on rules and triggers. Differs from simple scripts by using LLMs to handle language and decisions.
Investing a fixed amount at regular intervals (e.g. weekly or monthly) regardless of price. AI crypto agents can automate DCA into selected assets, reducing the impact of volatility and removing the need to time the market manually.
An automated phone system that routes callers through menus using voice or keypad input. Traditional IVR uses rigid decision trees; AI voice agents replace or augment IVR with natural-language conversations that qualify callers and book appointments without menu trees.
Technology that converts written text into spoken audio. AI voice agents use TTS to speak naturally on phone calls. Modern TTS models (e.g. ElevenLabs, OpenAI) produce near-human speech quality, enabling voice agents that sound conversational rather than robotic.
A legal contract that restricts parties from sharing confidential information. AI legal agents can extract NDA clauses, flag deviations from standard playbooks, and compare terms across agreements—speeding up first-pass review for legal teams from 30–45 minutes per NDA to under 5 minutes with human verification.
Identifying and isolating specific clauses (e.g. indemnity, termination, liability caps, IP ownership, governing law) from contracts. AI legal agents use NLP to extract clauses with 90–95% accuracy, flagging risks and deviations so lawyers focus on judgment rather than manual reading. Critical for contract review workflows, due diligence, and compliance audits.
AP tracks money a business owes to suppliers; AR tracks money owed by customers. AI finance agents automate AP/AR matching, categorize invoices, and flag discrepancies—reducing manual entry and speeding up month-end close.
A database of property listings shared among real estate brokers. AI real estate agents integrate with MLS feeds (via RETS, Spark API, or Paragon) to match buyers with listings, auto-generate property descriptions, and keep data in sync with your CRM—turning a 30-minute manual listing write-up into a 30-second generation task.
A free listing on Google that shows business info, reviews, and location in search and maps. AI local business agents can manage GBP updates, respond to reviews, post weekly updates, and keep hours and services accurate—improving local SEO signals and capturing leads that otherwise go to competitors with better-maintained profiles.
A platform that sells flights, hotels, and packages online (e.g. Expedia, Booking.com, Kayak, Priceline, Google Flights). AI travel agents compare prices across OTAs, surface the best deals, and can handle rebooking—giving travelers one place to plan instead of visiting multiple sites and manually comparing options.
An educational approach where content and pacing adjust to each student's performance in real time. AI tutoring agents use adaptive learning to identify weak areas, increase difficulty when mastery is shown, and personalize instruction—delivering 2x learning gains versus static self-study.
An HTTP callback that sends real-time data between systems when an event occurs (e.g. new lead, payment received). AI agents and no-code tools use webhooks to trigger workflows automatically—connecting CRMs, help desks, calendars, and custom apps without polling.
AI systems that autonomously plan, decide, and execute multi-step tasks with minimal human oversight. Unlike simple prompt-response models, agentic AI breaks goals into sub-tasks, uses tools (APIs, browsers, databases), and iterates until the objective is met. Sales, support, and coding agents are all examples of agentic AI in production.
A capability that lets AI models invoke structured functions (APIs, database queries, calculations) based on natural-language instructions. Instead of only generating text, the model outputs a JSON function call that your application executes. This is how agents book meetings, update CRMs, and trigger workflows.
Rules, filters, and constraints that keep AI agents within safe operating boundaries. Guardrails prevent agents from hallucinating, leaking sensitive data, or taking unauthorized actions. Examples include topic restrictions, PII redaction, confidence thresholds, and human-in-the-loop approval gates.
When an AI model generates plausible-sounding but factually incorrect information. In agent contexts, hallucination is mitigated through RAG (grounding responses in your knowledge base), confidence scoring, and citation requirements. Critical in legal, healthcare, and finance agents where accuracy is non-negotiable.
A database optimized for storing and searching high-dimensional embeddings (numerical representations of text, images, or audio). AI agents use vector databases to find semantically similar content—e.g., matching a support question to the most relevant KB article even if the exact words differ. Examples include Pinecone, Weaviate, and Chroma.
Dense numerical representations of text, images, or other data that capture semantic meaning. Similar concepts have similar embeddings, enabling AI agents to perform semantic search, clustering, and recommendations. Used in support (finding relevant articles), sales (matching leads to ICPs), and coding (finding similar code patterns).
An architecture where multiple specialized AI agents collaborate to complete complex tasks. One agent might research leads while another writes emails and a third manages scheduling. Multi-agent systems divide work by capability, enabling more reliable and scalable automation than a single monolithic agent.
The ability of an AI agent to invoke external tools—browsers, APIs, calculators, code interpreters—to accomplish tasks beyond text generation. Tool use is what distinguishes agents from chatbots: a sales agent uses CRM tools, a coding agent uses a terminal, and a finance agent queries accounting software.
Identifying the purpose behind a user's message (e.g., 'I want a refund' → refund intent). AI agents use intent classification to route requests to the right workflow: support tickets, sales inquiries, billing questions, or escalation to a human. Modern LLM-based classifiers handle nuance and multi-intent messages.
A workflow where an AI agent performs tasks but requires human approval at critical decision points. Common in high-stakes domains: a legal agent drafts a redline but a lawyer approves it; a cybersecurity agent recommends containment but an analyst clicks 'execute.' HITL balances automation speed with human judgment.
The maximum amount of text (measured in tokens) an AI model can process in a single interaction. Larger context windows let agents reason over longer documents—entire contracts, codebases, or conversation histories. Models now support 100K–1M+ tokens, enabling agents to handle complex tasks without losing context.
A platform that aggregates and analyzes security logs from across your infrastructure (firewalls, endpoints, servers). AI cybersecurity agents connect to SIEMs to triage alerts, correlate events, and identify threats faster than human analysts reviewing logs manually.
Technology that listens to doctor-patient conversations in real time and automatically generates structured clinical notes for the electronic health record (EHR). Ambient scribes save physicians 2+ hours daily on documentation, reducing burnout and increasing face-to-face patient time.
A prompting technique where the AI model reasons through intermediate steps before producing a final answer. Chain-of-thought improves accuracy on complex tasks like math, logic, and multi-step planning. Agent frameworks use CoT to break down goals into sub-tasks, making decisions more transparent and debuggable.
A central system that coordinates multiple AI agents, tools, and workflows to complete complex tasks. The orchestrator decides which agent to invoke, passes context between steps, handles errors, and ensures the overall goal is achieved. Think of it as the conductor of a multi-agent system—managing timing, dependencies, and fallbacks.
The basic unit of text that language models process. A token is roughly 3–4 characters or about 75% of a word. Tokens determine cost (most LLM APIs charge per token), context window limits, and processing speed. Understanding tokens helps you estimate agent operating costs and optimize prompt length.
The process of identifying emotional tone (positive, negative, neutral) in text or speech. AI agents use sentiment analysis to prioritize angry support tickets, gauge prospect interest in sales conversations, monitor brand perception on social media, and trigger human escalation when frustration is detected.
The time delay between sending a request to an AI model and receiving a response. Low latency is critical for real-time applications like voice agents (where delays feel unnatural) and live chat support. Factors include model size, infrastructure, and whether the agent needs to call external tools before responding.
A structured representation of entities and their relationships—people, companies, products, concepts—stored as nodes and edges. AI agents use knowledge graphs to reason about connections: which contacts work at which companies, which products compete, or which legal clauses relate to each other. Richer than flat databases for relationship-heavy tasks.
A cap on how many requests an application can make to an API within a given time window (e.g. 100 requests per minute). AI agents that call external APIs (CRMs, LLMs, databases) must respect rate limits to avoid errors and service disruptions. Proper rate-limit handling is essential for agents operating at scale.
AI systems that engage in natural-language dialogue with humans—understanding intent, maintaining context across turns, and generating relevant responses. Conversational AI powers chatbots, voice assistants, and support agents. Modern implementations use LLMs for open-ended understanding rather than rigid intent-matching rules.
The ability of an AI model to perform a task without any task-specific training examples. You describe what you want in the prompt, and the model generalizes from its pre-training. Zero-shot capability is why modern agents can handle novel requests—categorizing a never-seen-before support ticket type or writing copy for a new product category.
A sequence of tasks that an AI agent executes end-to-end without human intervention at each step. The agent plans the steps, executes them using tools and APIs, handles errors, and delivers a final result. Examples include researching a lead and sending a personalized email, or triaging a support ticket and resolving it from the knowledge base.
An automated sequence that moves and transforms data from source systems to a destination (data warehouse, dashboard, or AI model). AI data agents can build, monitor, and troubleshoot pipelines—detecting schema changes, flagging data-quality issues, and alerting when a pipeline fails before downstream reports break.
The practice of designing and refining instructions given to AI models to produce better outputs. Effective prompt engineering includes setting context, providing examples (few-shot), specifying output format, and defining constraints. For AI agents, prompt engineering determines behavior, tone, guardrails, and decision-making quality.
AI systems that process and generate multiple types of data—text, images, audio, video, and code—within a single model or agent. Multimodal agents can analyze a screenshot, describe it in text, generate a response audio file, or review a video for content moderation. This capability is critical for design, moderation, healthcare, and voice agents.
Research and practices aimed at ensuring AI systems behave as intended, avoid harmful outputs, and remain under human control. In the agent context, AI safety covers output filtering, action approval gates, alignment with user intent, and preventing misuse. Especially important for agents that take real-world actions like sending emails, modifying data, or executing code.
The process of running a trained AI model to generate predictions or outputs from new input data. Every time an agent answers a question, writes an email, or classifies a ticket, it's performing inference. Inference cost and speed directly affect agent operating expenses and user experience—faster inference means snappier agents.
A prompting pattern where an AI agent alternates between reasoning (thinking through the problem step by step) and acting (calling tools or taking actions). ReAct enables agents to plan, execute, observe results, and adjust—making them more reliable on complex, multi-step tasks than single-shot generation.
A server or service that acts as a single entry point for API requests, handling authentication, rate limiting, routing, and monitoring. In AI agent architectures, an API gateway manages traffic between agents and external services (CRMs, databases, LLM providers), enforcing security policies and providing observability.
The process of splitting large documents or datasets into smaller, manageable pieces (chunks) for processing by AI models. Chunking is essential for RAG systems: documents are split into chunks, embedded into vectors, and retrieved based on relevance. Chunk size and overlap strategy directly affect retrieval quality and agent accuracy.
A parameter (typically 0–2) that controls the randomness of AI model outputs. Lower temperature (0–0.3) produces more deterministic, focused responses—ideal for agents handling support tickets or contract review. Higher temperature (0.7–1.2) increases creativity—useful for marketing copy or brainstorming. Most agent platforms expose temperature as a configuration option.
Providing a small number of examples in the prompt so the AI model learns the desired output format, style, or reasoning pattern. Unlike fine-tuning (which retrains the model), few-shot learning happens at inference time. Sales agents use few-shot examples to match your email tone; support agents use them to follow your ticket-resolution playbook.
Delivering AI model output token by token as it's generated, rather than waiting for the complete response. Streaming reduces perceived latency for users—chat interfaces and voice agents feel more responsive when text appears incrementally. For voice agents, streaming is essential: text-to-speech begins before the full response is generated, cutting response time by 50–80%.
The branch of AI focused on enabling computers to understand, interpret, and generate human language. NLP underpins every AI agent that processes text or speech: reading emails, understanding support tickets, extracting contract clauses, or generating marketing copy. Modern NLP is powered by large language models that handle nuance, context, and multiple languages.
AI technology that enables machines to interpret and analyze visual information—images, videos, documents, and screenshots. Computer vision powers content moderation (detecting NSFW images), design agents (analyzing layouts and brand consistency), healthcare agents (reading medical images), and QA agents (visual regression testing of user interfaces).
Technology that converts spoken language into written text. AI voice agents rely on STT to understand what callers say before processing and responding. Modern STT models (Whisper, Deepgram, AssemblyAI) achieve 95%+ accuracy across accents and languages, enabling real-time transcription for phone support, sales calls, and meeting notes.
Artificially generated data that mimics real-world patterns without containing actual user or business information. Teams use synthetic data to train and test AI agents when real data is sensitive (healthcare, finance, legal) or scarce. Synthetic data enables agent development without privacy risks—useful for testing support deflection flows, sales sequences, and financial categorization models.
Transferring knowledge from a large, expensive AI model (teacher) to a smaller, faster model (student) that approximates the same performance at lower cost. Distillation enables production agents to run affordably at scale—a distilled model handles 90% of support tickets at 10% of the inference cost, while complex cases route to the full model.
An advanced retrieval pattern where the AI agent actively decides what to search for, evaluates retrieved results, and iterates if the initial retrieval is insufficient. Unlike basic RAG (single query → single retrieval), agentic RAG involves multi-step research: the agent decomposes complex questions, queries multiple sources, cross-references results, and synthesizes a comprehensive answer.
A numerical value (typically 0–1 or 0–100%) indicating how certain an AI agent is about its response or action. Agents use confidence scores to decide when to act autonomously (high confidence) versus escalate to a human (low confidence). Setting the right threshold balances automation speed with accuracy—too low and you get errors, too high and you lose efficiency.
A system that defines, executes, and monitors multi-step automated processes. In the AI agent context, workflow engines orchestrate agent tasks—triggering agents based on events, passing data between steps, handling errors, and tracking completion. Tools like n8n, Temporal, and Inngest serve as workflow engines that coordinate AI agent actions with traditional automation steps.
Search that understands meaning rather than just matching keywords. When a support agent searches your knowledge base for 'customer can\'t log in,' semantic search also finds articles about 'password reset,' 'account access issues,' and 'authentication errors'—even if those exact words weren't used. Powered by embeddings and vector databases, semantic search is what makes RAG-based agents accurate.
The price charged per token (input and output) when using LLM APIs. Token costs directly determine an AI agent's operating expense. GPT-4o charges ~$2.50 per million input tokens; Claude 3.5 Sonnet charges ~$3 per million. Optimizing token cost involves choosing the right model size for each task, caching common queries, and minimizing unnecessary context in prompts.
The process of finding and fetching relevant information from a knowledge base, database, or document store to provide context for an AI agent's response. Retrieval is the 'R' in RAG—without good retrieval, agents hallucinate or give generic answers. Retrieval quality depends on chunking strategy, embedding model, and index design.
An isolated environment where an AI agent can execute code, test actions, or process data without affecting production systems. Sandboxes are critical for coding agents (running untrusted code safely), data agents (testing queries before running on live databases), and any agent that takes real-world actions during development and testing phases.
A predefined alternative action an AI agent takes when its primary approach fails—such as escalating to a human when confidence is low, switching to a simpler model when latency spikes, or returning a canned response when the knowledge base has no match. Well-designed fallbacks prevent agents from failing silently or producing low-quality outputs.
Running AI agent tasks on a collection of items at once rather than one at a time—such as categorizing 1,000 transactions, screening 500 resumes, or generating 200 product descriptions in a single batch. Batch processing reduces per-item cost (often 50% cheaper than real-time) and is ideal for tasks that don't need instant results.
An open standard created by Anthropic that provides a universal interface for connecting AI models to external tools and data sources. MCP eliminates the need for fragile, one-off integrations by standardizing how agents discover and interact with CRMs, databases, and APIs. For example, a sales agent using MCP can connect to any MCP-compatible CRM without custom code, reducing deployment time from weeks to minutes.
A software library that provides the scaffolding for building AI agents—including tool use, memory management, planning loops, and orchestration. Frameworks like LangChain, CrewAI, AutoGen, and the Anthropic Agent SDK abstract away low-level LLM interactions so developers can focus on business logic. For example, a framework handles the observe-think-act loop while you define which tools the agent can call and what guardrails apply.
The ability of a large language model to invoke external functions and APIs during inference rather than only generating text. When a user asks an agent to book a meeting, the LLM outputs a structured tool call (e.g., a JSON object specifying the calendar API and parameters) that the application executes. Tool calling is what transforms a chatbot into an agent—enabling actions like CRM updates, database queries, and email sends.
The process of systematically testing and scoring AI agent outputs against defined criteria such as accuracy, helpfulness, safety, and task completion. Evaluation frameworks use test suites with expected outcomes, automated scoring rubrics, and human review to catch regressions before deployment. For example, a support agent eval might test 200 historical tickets and measure resolution accuracy, tone appropriateness, and escalation correctness.
Technology that replicates a specific person's voice for use in AI speech synthesis, enabling text-to-speech output that sounds like a particular individual. Voice cloning requires as little as 30 seconds of sample audio in modern systems. It powers personalized voice agents, branded phone experiences, and content narration—but raises ethical concerns around consent and deepfakes that require clear disclosure and authorization policies.
A multi-step process where AI agents plan, execute, and iterate autonomously toward a goal without requiring human intervention at each step. Unlike simple automation (if-then rules), agentic workflows involve reasoning, tool use, error handling, and adaptive replanning. For example, an operations agent might detect a failed deployment, diagnose the root cause, apply a fix, run tests, and notify the team—all without human prompting.
The ability of an AI agent to retain and recall information across conversations and sessions. Short-term memory holds context within a task (e.g. the current conversation); long-term memory persists across sessions using databases, vector stores, or file systems. Memory enables agents to learn user preferences, track project history, and avoid repeating questions—making them feel more like a colleague than a stateless tool.
An AI agent capability where the model directly controls a desktop or browser—clicking buttons, typing text, navigating menus, and reading screens like a human user. Computer use agents can operate any software, even without an API, by interacting with the GUI. This enables automation of legacy systems, complex SaaS workflows, and tasks that span multiple applications without custom integrations.
An open protocol that enables AI agents built by different vendors to discover each other's capabilities, negotiate tasks, and collaborate securely. A2A complements MCP (which connects agents to tools) by connecting agents to other agents. For example, a sales agent could delegate background research to a data agent from a different platform, with both communicating through a standardized interface.
A security attack where malicious input tricks an AI agent into ignoring its instructions and executing unintended actions. Direct injection embeds commands in user messages; indirect injection hides them in data the agent retrieves (emails, web pages, documents). Defenses include input sanitization, output filtering, instruction hierarchy, and sandboxing agent actions behind approval gates. Critical for any agent that reads external data.
An AI model that explicitly 'thinks through' problems step-by-step before producing an answer, spending additional compute on complex tasks. Reasoning models (like OpenAI o-series and Claude with extended thinking) excel at math, logic, coding, and multi-step planning. For agents, reasoning models improve accuracy on tasks that require analysis—contract review, debugging, financial modeling—at the cost of higher latency and token usage.
Monitoring and tracing every step an AI agent takes—LLM calls, tool invocations, decisions, and outputs—to debug failures, measure performance, and ensure reliability. Observability platforms (LangSmith, Arize, Braintrust) log agent traces, track latency and cost per step, surface error patterns, and alert on quality regressions. Essential for production agents where you need to understand why an agent took a specific action.
The end-to-end system that ingests documents, chunks them into passages, generates embeddings, stores them in a vector database, and retrieves relevant context at query time for RAG. A well-tuned retrieval pipeline determines agent answer quality: chunk size, overlap, embedding model choice, reranking, and metadata filtering all affect whether the agent finds the right information. Poor retrieval is the #1 cause of inaccurate agent responses.
An AI agent capability where the model controls a real web browser—clicking, typing, navigating, and reading rendered pages—to automate workflows on websites that don't expose APIs. Browser-use agents operate vendor portals, legacy admin panels, government forms, and multi-tab research workflows by understanding pages semantically rather than relying on brittle CSS selectors.
A deployment pattern where an AI agent runs in parallel with humans on real production traffic but its outputs are logged rather than delivered. Shadow mode lets teams measure agent quality against human baselines on real data before any customer is exposed to the agent. It is the standard first step in a staged rollout from pilot to production.
A gradual deployment strategy where an AI agent first handles a small percentage of traffic (often 1–5%) while metrics are monitored, then expands to larger percentages as confidence grows. Canary rollouts limit blast radius if the agent regresses, and they make it possible to detect quality, cost, and safety issues on real traffic before they affect every user.
The maximum scope of harm an AI agent can cause if it fails or is misused—how many records it can change, how much money it can move, how many customers it can reach. Designing agents with a small blast radius (least-privilege access, action limits, approval gates for irreversible operations) is the single highest-leverage safety practice for production deployments.
Running an open-source large language model on infrastructure you control—your own GPUs, your VPC, or on-prem hardware—instead of calling a managed API. Self-hosting is typically chosen for data residency and compliance requirements, very high token volumes where dedicated inference is cheaper than per-token API pricing, or workloads that need fine-tuning beyond what hosted providers expose.
A curated collection of representative tasks with known correct outcomes used to measure AI agent performance. Eval sets are run before every prompt change, model upgrade, and deployment to catch regressions early. A good eval set covers common cases, known edge cases, and historical failures—and grows over time as new failure modes are discovered in production.
The maximum acceptable response time for an AI agent to complete a task, broken down across each step in the pipeline (retrieval, LLM inference, tool calls, post-processing). Voice agents need sub-second latency for natural conversation; support chat agents target 2–5 seconds; background agents (email, research) can take minutes. Understanding your latency budget drives model selection, caching strategy, and architecture decisions.
An AI agent purpose-built for a specific industry or domain—such as legal contract review, healthcare clinical documentation, or real estate lead management. Vertical agents ship with domain-specific training data, pre-built integrations for industry tools, and compliance guardrails baked in. They trade flexibility for faster time-to-value and higher out-of-the-box accuracy in their target domain.
A general-purpose AI agent platform that can be configured for any industry or workflow—support, sales, marketing, operations, coding, and more. Horizontal agents provide flexible building blocks (LLM orchestration, tool integrations, workflow builders) and let teams assemble custom agents. They trade domain-specific accuracy for breadth and customization depth.
The policies, processes, and organizational structures that oversee how AI agents are built, deployed, monitored, and retired. Governance covers model selection approval, data access policies, audit logging, bias testing, incident response, and accountability assignment. As agents take more real-world actions (sending emails, modifying records, spending budget), governance frameworks ensure those actions are authorized, traceable, and reversible.
Connecting multiple specialized AI agents in sequence where the output of one agent becomes the input to the next. Unlike multi-agent systems where agents collaborate in parallel, chaining follows a linear pipeline: a research agent gathers data, a drafting agent writes a report, and a review agent checks quality. Chaining simplifies orchestration while enabling each step to use the best-suited model and tools.
The total cost to resolve a single customer interaction from start to finish—including LLM inference, tool calls, human escalation time, and infrastructure. Cost per resolution is the primary financial metric for support agents: it directly compares AI agent economics against human-only support. Typical AI agent cost per resolution ranges from $0.50–$2.00 versus $5–$15 for human agents, but only when the AI resolution is actually successful.
Transferring a conversation from an AI agent to a human representative while preserving the full context—conversation history, customer sentiment, intent classification, and any actions already taken. A warm handoff means the human picks up exactly where the agent left off, avoiding the frustration of customers repeating themselves. Contrast with a cold handoff where the human starts with no context.
The uncontrolled proliferation of AI agents across an organization—different teams deploying overlapping agents with inconsistent quality, security, and governance standards. Agent sprawl creates redundant costs, conflicting customer experiences, and security blind spots. Managing it requires a central agent registry, shared guardrail policies, and clear ownership for each deployed agent.
A feature supported by most major LLM providers that stores the processed representation of a long, repeated prompt prefix (system prompts, tool definitions, large reference documents) so subsequent calls skip re-processing and pay a fraction of the token cost. For high-volume agents with stable system prompts, prompt caching typically cuts inference cost by 50–90% and reduces latency noticeably on the cached portion.
A compact language model (typically 1B–15B parameters) designed to run cheaply and with low latency, often on-device or on modest GPUs. SLMs like Llama 3.1 8B, Phi-3, and Gemma handle narrow, well-defined agent tasks—classification, extraction, routing—at 10–50× lower cost than frontier models. A common production pattern uses an SLM as a first-pass router and escalates only hard cases to a large reasoning model.
A component that dynamically picks which LLM (fast/cheap vs. large/capable) to use for each individual request based on complexity, cost budget, or required accuracy. Model routers enable agents to serve the long tail of simple queries with cheap models while reserving expensive reasoning models for hard cases—often cutting total inference spend by 40–70% with no drop in user-facing quality.
The practice of deliberately attacking an AI agent—through adversarial prompts, prompt injection, jailbreaks, and edge-case inputs—to discover failure modes before attackers or customers do. Red teaming is a required step before launching agents in regulated domains (healthcare, finance, legal) and is increasingly standard for any customer-facing agent with write access to systems.
A prompt technique that bypasses an AI agent's safety instructions or guardrails, causing it to produce restricted content or perform disallowed actions. Jailbreaks range from simple role-play tricks ("pretend you're an unrestricted AI") to sophisticated multi-turn attacks. Defending against jailbreaks requires layered controls: system prompt hardening, input and output classifiers, and action-level authorization rather than relying on the model alone.
A reusable collection of capabilities—tools, prompts, workflows, and reference documents—that an AI agent can load on demand based on the task at hand. Instead of packing every possible instruction into one giant system prompt, a skill library lets the agent pull in only what's needed (e.g. the 'refund policy' skill for a billing ticket), which reduces tokens, improves focus, and makes capabilities easier to version and test.
The full cost of running an AI agent in production, including LLM inference, vector database and storage, observability, integration maintenance, human-in-the-loop review time, and ongoing evaluation. Sticker-price comparisons of per-token API cost frequently mislead buyers—TCO is what actually hits the budget. For most production agents, inference is 30–50% of TCO; the rest is infrastructure, ops, and human oversight.
The percentage of customer interactions an AI support agent resolves end-to-end without handing off to a human. A deflection rate of 60% means 60 of every 100 tickets are closed by the agent alone. Deflection rate is the single most important ROI metric for support agents—but it must be measured alongside CSAT and reopen rate, since deflection without quality destroys customer trust.
A hand-curated set of input/output pairs representing the correct behavior an AI agent should produce on important cases. Golden datasets serve as the authoritative baseline in evals: every prompt change, model upgrade, or new tool is tested against the golden set before shipping. Unlike synthetic test data, golden examples are vetted by subject-matter experts and updated whenever production reveals a new failure mode.
A guardrail that requires explicit human confirmation before an AI agent executes high-impact actions—sending a mass email, issuing a refund above a threshold, deleting records, or pushing code to production. Approval gates are the cheapest and most reliable way to shrink blast radius without slowing the agent down on routine work, and they're a standard requirement in regulated deployments.
A second-pass relevance scoring step in a retrieval pipeline that reorders candidate documents after the initial vector search. The reranker (a cross-encoder model) reads each query-document pair together and assigns a more accurate relevance score than embedding similarity alone. Reranking typically improves RAG answer quality by 10–25% with minimal latency cost, making it a standard component in production retrieval pipelines.
The requirement that data be stored and processed within a specific geographic jurisdiction—such as the EU, a single country, or a particular cloud region. Data residency requirements affect which LLM providers and hosting options an AI agent can use. Regulated industries (healthcare, finance, government) and GDPR-covered organizations often require that no customer data leaves their jurisdiction, ruling out US-only API endpoints.
Running an AI model directly on a user's device (phone, laptop, edge server) rather than calling a cloud API. On-device inference eliminates network latency, works offline, and keeps data local—addressing privacy concerns. Small language models (1B–7B parameters) now run on modern phones and laptops, enabling agents for note-taking, translation, and code completion without sending data to a server.
A defense mechanism where the AI model is trained to prioritize instructions from different sources in a fixed order: system prompt > developer instructions > user messages > retrieved content. Instruction hierarchy prevents prompt injection attacks where malicious content in emails, documents, or web pages tries to override the agent's behavior. It is one of the most effective defenses for agents that process untrusted external data.
A delivery model where a vendor provides a fully managed AI agent—including the underlying model, integrations, guardrails, and monitoring—as a subscription service. The customer configures behavior through prompts and settings but doesn't manage infrastructure or model selection. AaaS lowers the barrier to agent adoption for teams without ML engineering resources, similar to how SaaS replaced on-premise software for traditional applications.
The practice of loading as much relevant information as possible into an AI model's context window before generating a response—system instructions, retrieved documents, conversation history, tool outputs, and user data. While larger context windows enable richer agent behavior, context stuffing increases cost (more input tokens) and can dilute the model's attention. Effective agents balance context richness against focus, including only information that improves the response.
The process by which an AI agent breaks a complex goal into smaller, manageable sub-tasks before executing them. When asked to 'prepare a quarterly business review,' the agent decomposes this into: pull revenue data, calculate growth metrics, compare against targets, draft narrative, and format slides. Task decomposition is what separates agents from single-shot models—it enables multi-step reasoning and reliable execution of complex workflows.
Standardized evaluation of AI agent performance across defined tasks, metrics, and baselines—enabling apples-to-apples comparison between different agent solutions. Benchmarks measure task completion rate, accuracy, latency, cost per task, and safety compliance on representative workloads. Examples include SWE-bench for coding agents and customer support benchmarks that test resolution accuracy across ticket categories.
A development approach where the programmer describes what they want in natural language and an AI coding agent generates the implementation—the developer 'vibes' with the AI rather than writing code line by line. Coined in early 2025, vibe coding ranges from casual prototyping (describing an entire app in a few sentences) to professional workflows where experienced developers use AI agents for implementation while focusing on architecture and review.
A software development paradigm where AI coding agents autonomously plan, write, test, and iterate on code with minimal human direction. Unlike copilot-style autocomplete, agentic coding involves the AI independently navigating codebases, making architectural decisions, running tests, debugging failures, and submitting complete implementations. Tools like Devin, Claude Code, and Cursor's agent mode represent this paradigm.
A dialogue between a user and an AI agent that spans multiple exchanges, where each message builds on prior context. Managing multi-turn conversations requires the agent to track conversation state, resolve references ('that order' → order #4521 from earlier), handle topic switches, and maintain coherent context as the interaction grows. Multi-turn quality is what separates production-grade agents from basic prompt-response demos.
A pattern where an AI model dynamically invokes external tools—calculators, APIs, databases, code interpreters—during response generation to produce more accurate and grounded outputs. TAG extends RAG (which retrieves static documents) by enabling the model to take actions: run a SQL query to get current data, call an API for live pricing, or execute code to verify a calculation. TAG is the foundation of how production agents interact with business systems.
An architecture where a fast, cheap model handles the first pass on every request, and only routes complex or low-confidence cases to a larger, more expensive model. Unlike a simple model router that picks one model upfront, cascading tries the small model first, evaluates the output quality, and escalates if needed. This pattern typically reduces inference costs by 50–70% while maintaining the quality ceiling of the most capable model.
A mechanism where outcomes of an AI agent's actions—user ratings, task success/failure, correction data, and downstream metrics—are fed back to improve the agent's future performance. Feedback loops power continuous improvement through prompt refinement, retrieval tuning, eval set expansion, and fine-tuning. Without them, agents are static; with them, agents improve with every interaction. The loop can be automated (auto-add failed cases to evals) or human-driven (analysts review and correct agent outputs).
The initial set of instructions given to an AI model that defines its role, behavior, constraints, and output format before any user interaction begins. The system prompt is the primary mechanism for configuring an AI agent's personality, guardrails, and domain expertise. A well-crafted system prompt for a support agent might specify tone, escalation rules, prohibited topics, and the knowledge base to reference—effectively programming the agent's behavior without writing code.
A commercial software product that provides the infrastructure, integrations, and management tools needed to build, deploy, and operate AI agents without assembling the stack from scratch. Platforms typically bundle an LLM orchestration layer, pre-built connectors (CRM, helpdesk, email), a knowledge base, guardrails, analytics, and a management dashboard. Examples span vertical platforms (built for one domain like support or sales) and horizontal platforms (configurable for any use case).
An AI agent that completes multi-step tasks from start to finish without requiring human intervention at each step. Autonomous agents plan their approach, execute using tools and APIs, handle errors, and deliver results independently. The degree of autonomy exists on a spectrum—from copilots (suggest, human executes) to semi-autonomous (execute routine steps, ask for approval on high-stakes ones) to fully autonomous (end-to-end without human input). Most production agents operate in the semi-autonomous range.
A reusable prompt structure with variable placeholders that gets populated with specific data at runtime. Prompt templates separate the agent's instructions from the dynamic content—the template defines how to respond, while variables inject the specific customer name, ticket details, or product context for each interaction. Templates enable consistent agent behavior across thousands of interactions while personalizing each response.
Running AI models on devices at the network edge—phones, laptops, IoT devices, factory equipment, or local servers—rather than in centralized cloud data centers. Edge AI eliminates network latency, works without internet connectivity, and keeps sensitive data on-device. For agents, edge deployment enables real-time voice processing on phones, on-premises document analysis in regulated environments, and local coding assistance in air-gapped networks.
A software development kit that provides libraries, APIs, and tooling for building AI agents programmatically. SDKs give developers lower-level control than no-code platforms—defining custom tool use, memory management, orchestration logic, and evaluation pipelines in code. The Anthropic Agent SDK, OpenAI Agents SDK, and LangChain are examples. SDKs are preferred when agents need custom business logic, non-standard integrations, or tight coupling with existing application code.
The mechanism by which an AI agent maintains context within a single conversation—tracking what was said, what was agreed, and what state changes occurred across multiple turns. Conversational memory goes beyond raw chat history: it includes resolved references ('that order' → order #4521), tracked entities (the customer, the product, the issue), and accumulated state (steps already tried, information already collected). Good conversational memory prevents agents from asking for information the user already provided.
The process of transferring a task, conversation, or workflow from one AI agent to another—or from an AI agent to a human—while preserving full context. Effective handoffs include conversation history, resolved entities, attempted actions, and the reason for the transfer. Poor handoffs force the receiving agent or human to start from scratch, frustrating users and wasting time. Handoff protocols are a core feature of multi-agent systems and human-in-the-loop architectures.
The coordination of multiple external tools and APIs within a single agent workflow. Rather than calling one tool at a time, an orchestrating agent plans which tools to call, in what order, handles dependencies between tool outputs, and manages failures gracefully. For example, a sales agent might orchestrate CRM lookup, email composition, calendar availability check, and meeting booking in a single workflow—passing data between each step automatically.
The process of directing an incoming request to the most appropriate AI agent or workflow based on the request's content, intent, urgency, or customer attributes. Routing can be rule-based (keywords, customer tier) or AI-driven (intent classification, semantic matching). Good routing ensures that a billing question goes to the billing agent, a technical issue goes to the tech support agent, and a VIP customer gets priority handling—all without manual triage.
Storing and reusing AI model responses for identical or semantically similar inputs to reduce latency and cost. Exact-match caching returns stored responses when the same prompt is received again. Semantic caching uses embeddings to match similar (but not identical) queries to cached responses. LLM caching can reduce inference costs by 30–60% for agents that handle repetitive queries—common in support, FAQ, and classification workloads.
Breaking a complex task into a sequence of simpler prompts, where each prompt's output feeds into the next as input. Unlike a single monolithic prompt that tries to do everything at once, chaining improves reliability by letting each step focus on one subtask. A sales email workflow might chain: (1) research the prospect, (2) identify pain points from the research, (3) draft the email using the pain points, (4) review the email for compliance. Each step is a focused prompt with clear input and output.
A proxy layer that sits between your application and one or more LLM providers, handling authentication, rate limiting, load balancing, fallback routing, cost tracking, and logging. AI gateways enable organizations to switch between LLM providers without changing application code, enforce usage policies, and maintain observability across all AI interactions. Examples include Portkey, LiteLLM, and cloud-provider offerings from AWS and Azure.
The process of annotating raw data (text, images, audio) with tags or classifications that teach AI models to recognize patterns. For AI agents, labeled data is used to train intent classifiers, evaluate agent accuracy, fine-tune models for domain-specific tasks, and build evaluation test suites. Modern approaches combine human labelers with AI-assisted labeling (where models pre-label data and humans correct errors), reducing cost by 50–70% versus fully manual annotation.
A platform where pre-built AI agents, agent templates, and agent components (tools, integrations, prompt libraries) are published, discovered, and deployed. Marketplaces reduce time-to-value by letting organizations start with proven agents rather than building from scratch. Some marketplaces offer one-click deployment of full agent workflows; others provide composable building blocks that developers assemble into custom agents.
A metric that measures how satisfied customers are with a specific interaction, product, or service—typically collected via a 1–5 or 1–10 post-interaction survey. AI support and voice agents track CSAT on every resolved ticket to measure quality alongside efficiency. A high deflection rate with declining CSAT signals the agent is closing tickets without actually solving problems. Production agents target CSAT parity with human agents (4.2+ out of 5) before scaling automation.
A loyalty metric based on one question: 'How likely are you to recommend us to a colleague?' Respondents score 0–10 and are grouped as Promoters (9–10), Passives (7–8), or Detractors (0–6). NPS = % Promoters minus % Detractors. AI customer success agents use NPS trends to identify at-risk accounts (declining scores trigger proactive outreach) and measure whether AI-driven interactions build or erode customer loyalty over time.
A contract or internal commitment defining the expected performance standards for a service—response time, resolution time, uptime, and quality thresholds. AI agents help teams meet SLAs by providing instant first responses (meeting response-time SLAs even outside business hours), auto-prioritizing tickets by SLA urgency, and alerting human agents when a ticket approaches its SLA deadline. For support teams, AI agents typically improve SLA compliance from 75–85% to 95%+ by eliminating queue wait times.
A technology that combines OCR, NLP, and machine learning to extract, classify, and validate data from unstructured documents—invoices, contracts, medical records, tax forms, and government filings. IDP goes beyond basic OCR by understanding document structure, extracting specific fields (invoice amount, vendor name, due date), and handling variations across document formats. AI agents use IDP as a tool to process documents as part of larger workflows like accounts payable, claims processing, and compliance review.
Technology that converts images of text—scanned documents, photos, PDFs—into machine-readable text. OCR is the foundational layer that enables AI agents to process paper-based and image-based documents. Modern OCR engines handle multiple languages, handwritten text, low-quality scans, and complex layouts (tables, forms, multi-column documents). AI agents in finance, legal, and healthcare use OCR as the first step in document processing pipelines.
A data integration pattern that extracts data from source systems, transforms it into a consistent format (cleaning, normalizing, enriching), and loads it into a destination (data warehouse, analytics platform, or AI model). AI data agents automate ETL by detecting schema changes, handling new data formats, resolving quality issues, and adapting transformation rules—tasks that traditionally require manual pipeline maintenance. Modern variations include ELT (load raw data first, transform in the warehouse) which AI agents also manage.
Any data that can identify a specific individual—names, email addresses, phone numbers, Social Security numbers, IP addresses, biometric data, and financial account numbers. AI agents that process customer data must detect and handle PII appropriately: redacting it from logs, encrypting it in storage, and never including it in LLM prompts sent to third-party APIs unless the provider's data processing agreement permits it. PII mishandling is the fastest path to regulatory penalties (GDPR fines up to 4% of global revenue) and customer trust destruction.
A technique that reduces the precision of an AI model's numerical weights—typically from 32-bit floating point to 8-bit or 4-bit integers—to shrink the model's memory footprint and increase inference speed. Quantization makes it possible to run large language models on smaller GPUs, edge devices, and consumer hardware. A 70B-parameter model that normally requires 140GB of GPU memory can run in 35GB with 4-bit quantization, with only a 2–5% drop in quality for most tasks.
A training technique where an AI model is improved by learning from human preferences rather than just predicting the next token. Human raters rank model outputs from best to worst, and a reward model is trained on these rankings. The language model is then fine-tuned to maximize the reward score. RLHF is how frontier models like Claude and GPT-4 learn to be helpful, harmless, and honest—and it directly affects how well agents follow instructions, stay on topic, and avoid harmful outputs.
The maximum number of tokens (input + output) an AI agent is allowed to consume per task, session, or billing period. Token budgets prevent runaway costs from agent loops, overly long conversations, or verbose tool outputs. A well-configured token budget forces efficient prompt design and retrieval—if a support agent has a 4,000-token budget per ticket, it must retrieve only the most relevant KB passages rather than stuffing everything into context.
A retrieval strategy that combines keyword-based search (BM25, full-text) with semantic vector search to find the most relevant documents for an AI agent's response. Keyword search catches exact matches (error codes, product names, policy numbers) that semantic search misses, while semantic search handles paraphrased queries and conceptual similarity. Fusing both approaches typically improves retrieval accuracy by 15–30% compared to either alone.
A detailed, step-by-step log of everything an AI agent did during a single task execution—each LLM call (prompt and response), every tool invocation (input and output), branching decisions, retry attempts, and the final result. Traces are the primary debugging tool for production agents: when a ticket is resolved incorrectly, the trace shows exactly where the agent went wrong—bad retrieval, misinterpreted intent, or faulty tool output.
Techniques for reducing the number of tokens in an AI agent's context window while preserving the essential information. Methods include summarizing long conversation histories, extracting key facts from retrieved documents, pruning irrelevant tool outputs, and using specialized compression models. Context compression enables agents to handle longer conversations and more complex tasks within token budget and context window limits—especially important for cost-sensitive production deployments.
The maximum number of tokens a language model can process in a single call—encompassing the system prompt, conversation history, retrieved documents, tool outputs, and the model's response. Context windows range from 8K tokens (older models) to 200K+ tokens (Claude, Gemini). A larger context window allows agents to reason over more information simultaneously, but cost scales linearly with input tokens. Understanding your model's context window is essential for designing retrieval strategies and conversation management.
A lightweight AI model that runs before or after the main agent response to detect policy violations—toxicity, PII leakage, off-topic responses, prompt injection attempts, or unauthorized actions. Guardrail classifiers add 20–50ms of latency but prevent harmful outputs from reaching users. They operate independently of the main model, providing defense-in-depth: even if the primary model is jailbroken, the classifier catches the violation.
A search paradigm where an AI agent actively researches a query by breaking it into sub-queries, searching multiple sources, evaluating and cross-referencing results, and synthesizing a comprehensive answer—rather than returning a ranked list of links. Agentic search handles complex questions ('What are the compliance implications of deploying AI agents in EU healthcare?') that no single document answers completely. Search engines like Perplexity, Google AI Overviews, and ChatGPT Search use agentic search patterns.
The European Union's comprehensive regulation governing the development, deployment, and use of artificial intelligence systems—the first major AI-specific legislation in the world. The EU AI Act classifies AI systems into risk tiers (unacceptable, high, limited, minimal) and imposes requirements proportionate to each tier: transparency obligations for chatbots and voice agents, and strict documentation, logging, human oversight, and accuracy requirements for high-risk systems like HR screening and credit scoring agents.
The neural network architecture that powers virtually all modern large language models and AI agents. Introduced in 2017 ('Attention Is All You Need'), transformers process input text in parallel using self-attention mechanisms that capture relationships between all words simultaneously—unlike earlier architectures that processed text sequentially. GPT, Claude, Llama, and Gemini are all transformer-based. Understanding transformers helps explain why modern agents can reason about long contexts and generate coherent, contextually aware responses.
A set of principles and practices for building and deploying AI systems that are fair, transparent, accountable, safe, and privacy-preserving. Responsible AI goes beyond regulatory compliance to include proactive bias testing, explainability of agent decisions, environmental impact consideration, and stakeholder engagement. For AI agent teams, responsible AI practices include regular bias audits, transparent disclosure of AI involvement, robust guardrails against harmful outputs, and clear accountability when agents make mistakes.
An AI application that combines multiple components—LLM calls, retrieval systems, code execution, classifiers, validators, and traditional logic—into a single integrated system rather than relying on a single model call. Most production AI agents are compound systems: a support agent might chain an intent classifier, a retrieval pipeline, an LLM for response generation, a guardrail classifier, and a confidence scorer. Compound systems outperform single-model approaches because each component can be independently optimized, tested, and improved.
An organization's preparedness to successfully adopt and benefit from AI agents—spanning data quality, technical infrastructure, team skills, process maturity, and cultural willingness to trust automated systems. AI readiness assessments identify gaps before deployment: Is your knowledge base structured and up-to-date? Do you have clean CRM data for the agent to work with? Does your team understand how to supervise and evaluate an AI agent? Low AI readiness is the #1 predictor of failed agent deployments, ahead of technology selection.
The capability of AI models to generate responses in a specific, machine-parseable format—JSON, XML, typed objects—rather than free-form text. Structured output is essential for AI agents because their outputs often feed directly into other systems: a sales agent must output a JSON object that the CRM API accepts, a finance agent must produce structured transaction categories, and a coding agent must generate valid code in the correct language. Most major LLM providers now support constrained output schemas that guarantee valid structured responses.
A model capability where the AI spends additional compute on internal reasoning before producing a visible response—effectively 'thinking longer' about complex problems. Extended thinking improves performance on tasks requiring multi-step logic, mathematical reasoning, code architecture decisions, and nuanced analysis. For agents, extended thinking is most valuable in high-stakes, accuracy-critical tasks: legal contract review, complex debugging, financial modeling, and strategic planning. The tradeoff is higher latency and token cost per response.
An inference optimization where the processed representation of static prompt content (system instructions, tool definitions, reference documents) is cached between requests so the model skips re-processing it on subsequent calls. Context caching is distinct from response caching—it caches the input processing, not the output. For agents with large, stable system prompts (common in production deployments), context caching reduces per-request latency by 30–60% and cost by 50–90% on the cached portion. Supported by Anthropic (prompt caching), Google (context caching), and OpenAI (automatic caching).
Directing AI requests to different models or endpoints based on task complexity, cost, or latency requirements. A router might send simple classification tasks to a small, fast model and complex reasoning tasks to a larger, more capable one. This reduces costs by 40–70% compared to routing everything through the most powerful model, while maintaining quality where it matters.
A flow control mechanism that slows or pauses an AI agent's task intake when downstream systems can't keep up. Without backpressure, an agent processing thousands of tasks can overwhelm APIs, databases, or human review queues. Backpressure ensures the agent operates within the capacity of the systems it depends on, preventing cascading failures and data loss.
An interaction pattern where an AI agent splits a conversation into parallel threads or sub-tasks, processes them independently, and merges results back into the main flow. Common in support agents that need to look up billing, check order status, and review account history simultaneously to resolve a complex ticket.
A self-reinforcing cycle where an AI agent's outputs generate data that improves its future performance. More usage produces more labeled examples, which improve the model, which produces better outputs, which drives more usage. In agent deployments, the flywheel turns customer interactions into training signal—support agents learn from resolved tickets, sales agents learn from closed deals.
The infrastructure that hosts trained AI models and handles inference requests—loading model weights into memory, processing inputs, returning outputs, and managing concurrency. Model serving is the production runtime for AI: it determines latency, throughput, cost, and availability. Self-hosted agents require model serving infrastructure; API-based agents (Claude, GPT) abstract it away.
Coordinating the execution order, error handling, and data flow between multiple function calls within an agent's workflow. When an agent needs to call five APIs to complete a task, function orchestration determines which calls can run in parallel, which depend on previous results, how to handle partial failures, and how to retry or fall back gracefully.
A classification of how independently an AI agent operates, typically on a scale from fully human-controlled to fully autonomous. Level 1: human does the task, agent assists. Level 2: agent drafts, human approves. Level 3: agent executes, human reviews after. Level 4: agent executes autonomously, human handles exceptions. Most production agents operate at Level 2–3, with Level 4 reserved for low-risk, high-volume tasks.
A two-stage search process where an initial retrieval step finds candidate results (using keyword or vector search) and a reranking model scores them by relevance to the specific query. Reranking dramatically improves search quality for AI agents: the first stage is fast but approximate, the reranker is slower but precise. Support agents use it to find the most relevant KB article; legal agents use it to surface the most pertinent precedent.
The ability to understand, evaluate, and effectively use AI tools and agents—including knowing what AI can and cannot do, how to write effective prompts, how to evaluate AI outputs for accuracy, and how to set appropriate guardrails. AI literacy is becoming a core workforce skill: teams with higher AI literacy adopt agents faster, configure them more effectively, and catch errors that less-literate teams miss. Organizations investing in AI literacy training see 2–3x faster time-to-value on agent deployments.
An AI agent conceptualized as a virtual team member that handles a defined set of responsibilities—processing invoices, qualifying leads, triaging support tickets, or scheduling appointments. The digital worker framing shifts how organizations think about AI agents: instead of a tool you configure, it's a role you hire for, with KPIs, a scope of authority, and an escalation path. This mental model helps non-technical stakeholders understand what the agent does and how it fits into existing team structures.
An operational mode where an AI agent assists a human by suggesting actions, drafting outputs, and surfacing information—but the human makes all final decisions and executes all actions. Copilot mode is the starting point for most agent deployments: the agent drafts the email but the rep clicks send, the agent suggests the diagnosis but the doctor confirms it. Teams typically run in copilot mode for 2–4 weeks to build trust and calibrate quality before granting the agent more autonomy.
The process of verifying that an AI agent's output meets quality, accuracy, safety, and format requirements before it's delivered to the user or passed to the next step in a workflow. Validation can be automated (schema checks, PII detection, confidence scoring, fact-checking against source documents) or human (reviewer approves before sending). Output validation is the last line of defense against hallucination, policy violations, and formatting errors in production agents.
The mechanism by which an AI agent automatically retries failed operations—LLM API calls that time out, tool invocations that return errors, or retrieval queries that return no results. Good retry logic uses exponential backoff (waiting progressively longer between attempts), distinguishes between transient errors (network timeouts—retry) and permanent errors (invalid API key—don't retry), and sets maximum attempt limits to prevent infinite loops. Without retry logic, agents fail on the first hiccup; with it, they handle the intermittent failures that are normal in distributed systems.
Managing how many tasks an AI agent processes simultaneously to balance throughput against resource constraints and system stability. An agent processing support tickets might handle 10 concurrently; an agent making API calls to a rate-limited service might limit to 3. Concurrency control prevents agents from overwhelming downstream systems, exceeding API rate limits, or consuming excessive compute resources. It's the difference between an agent that processes 1,000 tickets smoothly over an hour and one that tries all 1,000 at once and crashes the CRM.
Narrowing vector search results by structured attributes (date, category, department, access level) before or after semantic matching. Without metadata filtering, a support agent searching for 'refund policy' might retrieve the refund policy from 2023 instead of the current one, or surface an internal policy document that shouldn't be shown to customers. Metadata filters ensure the agent only retrieves documents that match the relevant context—current versions, correct department, appropriate access level.
The policies, processes, and standards that control how data is collected, stored, accessed, used, and disposed of within an organization. For AI agents, data governance determines what data the agent can access (least-privilege access controls), how it processes sensitive information (encryption, anonymization), where data is sent (which LLM providers, what jurisdictions), and how long it's retained. Poor data governance in agent deployments creates compliance risk (GDPR, HIPAA violations), security vulnerabilities (over-privileged agents), and trust erosion (customers learning their data was sent to third-party AI providers without consent).
Additional computation allocated during model inference (response generation) rather than during training. Techniques like chain-of-thought reasoning, beam search, self-verification, and extended thinking allow models to 'think longer' on harder problems—trading speed and cost for accuracy. Inference-time compute scaling is why modern reasoning models can solve complex math, code, and planning tasks that earlier models couldn't, and it's the mechanism behind features like Claude's extended thinking and OpenAI's o-series models.
A model architecture where multiple specialized sub-networks ('experts') exist within a single model, and a routing mechanism activates only the most relevant experts for each input. MoE models can be very large in total parameters but fast and efficient at inference because only a fraction of the network is active per request. This architecture powers several frontier models and enables better performance without proportional increases in compute cost.
A safety and alignment methodology where an AI model is trained to follow a set of principles (a 'constitution') that govern its behavior—such as being helpful, harmless, and honest. Instead of relying solely on human feedback for every possible scenario, the model uses its principles to self-evaluate and improve its responses. Developed by Anthropic, Constitutional AI is the foundation for building agents that refuse harmful requests, avoid deception, and maintain consistent ethical behavior at scale.
A multi-agent architecture where numerous lightweight, specialized agents collaborate on a task simultaneously—each handling a small part of the work and coordinating through shared state or message passing. Unlike traditional multi-agent systems with fixed roles, swarms dynamically allocate work and scale the number of active agents based on task complexity. Swarms excel at parallelizable tasks: researching dozens of leads at once, processing a batch of support tickets, or testing multiple code paths simultaneously.
An agent architecture pattern where the AI generates an initial output, then evaluates that output against criteria (accuracy, completeness, safety, style) and iterates to improve it before delivering the final result. Reflection mimics human self-review: write a draft, re-read it critically, fix issues, and submit the polished version. Agents using reflection produce higher-quality outputs on complex tasks—especially code generation, legal analysis, and content creation—at the cost of additional inference time and tokens.
A security attack where a malicious actor manipulates the tools, data sources, or APIs that an AI agent relies on—causing the agent to take harmful actions based on corrupted inputs. Unlike prompt injection (which targets the agent's instructions), tool poisoning targets the external systems the agent trusts. Examples include injecting malicious content into a knowledge base the agent searches, manipulating API responses to alter agent behavior, or compromising MCP server tool descriptions to redirect agent actions.
The process of training AI models to behave in accordance with human intentions, values, and specified objectives. Aligned models follow instructions accurately, refuse harmful requests, acknowledge uncertainty, and avoid deceptive or manipulative behavior. Alignment techniques include RLHF, Constitutional AI, and instruction tuning. For AI agents, alignment is especially critical because agents take real-world actions—a misaligned agent that misinterprets objectives can send wrong emails, delete data, or make unauthorized purchases.
A no-code or low-code platform that lets users design, configure, and deploy AI agent workflows through a visual interface—connecting triggers, AI processing steps, tool integrations, and human approval gates without writing code. Workflow builders democratize agent creation: operations teams can automate processes, marketers can build content pipelines, and support teams can create ticket routing flows. Examples include n8n, Make, Zapier AI, and Relevance AI. They trade customization depth for speed of deployment.
A technique where a smaller, faster AI model (the student) is trained to replicate the behavior of a larger, more capable model (the teacher). Distillation transfers the teacher's knowledge into a compact model that's cheaper and faster to run while retaining most of the performance. For AI agents, distillation enables deploying capable models on edge devices, reducing inference costs at scale, and meeting latency requirements that large models can't hit. OpenAI, Anthropic, and Google all offer distilled model variants.
The practice of connecting AI model outputs to verifiable, factual sources—documents, databases, APIs, or real-time data—so responses are based on evidence rather than the model's parametric memory alone. Grounding is the primary defense against hallucination. Techniques include RAG (retrieving relevant documents), tool use (querying live APIs), and citation requirements (forcing the model to reference specific sources). Grounded agents are essential in high-stakes domains like legal, healthcare, and finance where accuracy is non-negotiable.
A technique where an AI system classifies incoming requests by meaning (not keywords) and routes them to the appropriate handler—a specific agent, model, tool, or workflow. Unlike keyword-based routing that matches exact phrases, semantic routing uses embeddings to understand intent, handling synonyms, paraphrases, and multilingual input correctly. Used in multi-agent systems to decide which specialist agent handles a request, and in support systems to route tickets to the right team.
A release strategy where a new version of an AI agent is deployed to a small percentage of traffic (e.g., 5%) while the existing version handles the rest. If the canary shows degraded performance—higher error rates, lower quality scores, or increased latency—the rollout is halted before it affects all users. Canary deployments are critical for AI agents because prompt changes, model updates, and new tool integrations can cause subtle quality regressions that aren't caught by offline testing alone.
Running AI models directly on local devices (phones, laptops, IoT hardware, on-premise servers) rather than sending requests to cloud-hosted APIs. Edge inference eliminates network latency, works offline, and keeps sensitive data on-device—critical for healthcare agents handling patient data, voice agents needing sub-100ms responses, and manufacturing agents operating in facilities without reliable internet. Trade-offs include limited model size (smaller models only) and higher hardware requirements.
An inference optimization technique where a small, fast 'draft' model generates candidate tokens ahead of the main model, and the main model verifies them in parallel. When the draft model's predictions match what the main model would have produced, tokens are accepted instantly—reducing latency by 2–3x without changing output quality. Speculative decoding is particularly valuable for AI agents where response latency directly affects user experience, especially voice agents and live-chat support agents.
The core execution cycle of an AI agent: observe the current state, reason about what to do next, take an action (call a tool, generate a response, update memory), observe the result, and repeat until the goal is achieved or a stopping condition is met. The agentic loop is what distinguishes agents from single-shot LLM calls—agents iterate, adapt to intermediate results, recover from errors, and pursue multi-step objectives. Loop control (when to continue, when to stop, when to ask for human input) is one of the hardest design problems in agent engineering.
The process of extracting structured data from an AI model's free-text output—converting natural language into JSON, database records, API calls, or other machine-readable formats. Reliable output parsing is essential for AI agents because downstream tools and integrations require structured inputs. Techniques include JSON mode (constraining model output to valid JSON), function calling (the model emits structured function invocations), regex extraction, and grammar-constrained decoding. Parsing failures are a common source of agent errors in production.
A technique that reduces the numerical precision of an AI model's weights—typically from 16-bit floating point to 8-bit or 4-bit integers—to shrink model size and speed up inference with minimal quality loss. A 70B-parameter model that requires 140GB of GPU memory at full precision fits in 35GB at 4-bit quantization, enabling deployment on a single consumer GPU instead of a multi-GPU server. Quantization is the key enabler for self-hosted and edge-deployed AI agents.
A parameter-efficient fine-tuning technique that trains a small set of adapter weights (typically 0.1–1% of the full model) rather than updating all model parameters. LoRA enables domain-specific customization of large language models at a fraction of the cost and time of full fine-tuning—hours instead of days, $50–$500 instead of $10,000+. Multiple LoRA adapters can be swapped on a single base model, letting one deployment serve different agent behaviors (support tone, sales style, legal precision) by loading the appropriate adapter.
A training technique where human evaluators rank AI model outputs by quality, and those rankings train a reward model that guides the AI toward more helpful, accurate, and safe responses. RLHF is the primary method used to align large language models with human preferences—transforming a base model that predicts the next token into an assistant that follows instructions, avoids harmful content, and produces genuinely useful responses. It's why modern AI agents feel helpful rather than just fluent.
Any data that can identify a specific individual—names, email addresses, phone numbers, social security numbers, IP addresses, and biometric data. AI agents that process customer interactions, healthcare records, financial transactions, or HR data inevitably handle PII. Proper PII management requires detection (identifying PII in text), redaction (removing PII before it reaches the LLM or logs), access controls (limiting which agents and users can see PII), and compliance with regulations like GDPR, CCPA, and HIPAA that govern PII handling.
A security framework that restricts AI agent capabilities based on the user's or agent's assigned role. In agent architectures, RBAC controls which tools an agent can call (a support agent can read CRM data but not modify billing), which data it can access (a junior support agent sees ticket history but not financial records), and which actions require escalation (refunds over $100 need supervisor approval). RBAC is the primary mechanism for enforcing least-privilege access in multi-agent deployments.
A self-reinforcing cycle where an AI agent's interactions generate data that improves the agent, which drives more usage, which generates more data. In practice: a support agent handles tickets → successful resolutions are added to the knowledge base → the agent gets better at resolving similar tickets → more tickets are deflected → more resolution data is generated. The flywheel effect means AI agents improve fastest in high-volume environments and create a compounding advantage over time that's difficult for competitors to replicate.
The practice of tracking, storing, and managing changes to an AI agent's prompts (system prompts, tool descriptions, and instruction sets) with the same rigor as source code versioning. Each prompt change gets a version number, timestamp, author, and rationale. Prompt versioning enables rollback when a change degrades quality, A/B testing of prompt variants, audit trails for compliance, and reproducible debugging when an agent behaves unexpectedly. Without versioning, teams lose track of what changed and when—making it impossible to diagnose regressions.
A model architecture where the network contains multiple specialized sub-networks (experts), and a routing mechanism activates only a subset of experts for each input. A 400B-parameter MoE model might activate only 50B parameters per token, achieving near-frontier quality at the inference cost of a much smaller model. MoE architectures (used in models like Mixtral and reportedly in GPT-4) are why some AI agents can deliver high-quality responses at surprisingly low latency and cost—the model is large in total but efficient per-query.
A SaaS metric measuring the percentage of recurring revenue retained from existing customers over a period, including expansions, contractions, and churn. An NRR above 100% means expansion revenue from upsells and cross-sells exceeds lost revenue from downgrades and cancellations. AI agents improve NRR by identifying expansion opportunities from product usage signals and automating personalized upsell outreach at scale.
Mandatory education programs that teach employees about laws, regulations, and company policies relevant to their roles—covering areas like data privacy (GDPR, CCPA), workplace safety (OSHA), anti-harassment, financial regulations (SOX, BSA/AML), and industry-specific requirements. AI agents transform compliance training from generic annual video courses into personalized, role-specific micro-learning programs with adaptive assessment and real-time regulatory updates.
An educational approach that delivers content in short, focused segments (typically 3-10 minutes) rather than long training sessions. AI agents use micro-learning to deliver compliance updates, product training, and skill development through daily or weekly bite-sized modules. Research shows 70%+ retention rates for micro-learning versus 20% for traditional hour-long sessions, making it the preferred format for AI-powered corporate training agents.
A learning technique where information is reviewed at increasing intervals over time to optimize long-term retention. AI tutoring and training agents use spaced repetition algorithms to resurface concepts just before they would be forgotten—typically at 1 day, 3 days, 1 week, 2 weeks, and 1 month intervals. This approach is 2-3x more effective than massed practice (cramming) for long-term knowledge retention.
Using AI agents to streamline the employee performance review process—automatically collecting achievement data from work systems (project management, code repositories, CRM), generating competency-mapped first drafts, synthesizing peer feedback, and flagging potential bias in language and ratings. Reduces manager prep time by 60-70% while improving feedback quality and reducing recency bias.
The ability to track materials, products, and shipments across the entire supply chain in real time—from raw material sourcing through manufacturing, logistics, and final delivery. AI agents enhance visibility by integrating data from ERP, TMS, WMS, carrier portals, and customs systems into a unified view, proactively detecting disruptions and automating exception handling. Full end-to-end visibility reduces expedited freight costs by 15-25% and safety stock requirements by 20-30%.
The process of identifying, prioritizing, and resolving deviations from expected workflows or outcomes. In supply chain, operations, and finance contexts, exceptions include late shipments, invoice discrepancies, failed quality checks, and SLA breaches. AI agents automate exception management by detecting anomalies in real time, prioritizing by business impact, handling routine exceptions autonomously, and escalating complex issues with full context and recommended actions.
A cognitive bias where recent events are weighted more heavily than earlier ones when making evaluations. In performance reviews, recency bias causes managers to base ratings primarily on the last 2-4 weeks rather than the full review period—resulting in inaccurate assessments that miss sustained contributions and penalize employees whose strongest work happened months ago. AI agents counter recency bias by tracking achievements throughout the review period and surfacing a balanced, time-distributed evidence set.
A caching strategy that stores and retrieves AI model responses based on the semantic meaning of the input rather than exact string matching. When a user asks 'What is your return policy?' and a cached response exists for 'How do I return an item?', semantic caching recognizes these as equivalent and serves the cached answer. This reduces LLM inference costs by 30-60% for support and sales agents that handle repetitive queries with varied phrasing.
The gradual degradation of an AI agent's performance over time as the underlying data, user behavior, or business context changes while the agent's configuration remains static. Drift manifests as declining accuracy, increased hallucination rates, or irrelevant responses—often so gradually that it is not noticed until performance has significantly deteriorated. Common causes include knowledge base staleness, shifting customer vocabulary, product catalog changes, and upstream model updates by the LLM provider.
A measure of how sensitive customer demand is to price changes. Elastic products see significant volume changes with small price adjustments; inelastic products maintain demand regardless of price. AI ecommerce agents use price elasticity models to optimize pricing: reducing prices on elastic products to capture volume and increasing prices on inelastic products to capture margin—maximizing total contribution across the catalog.
The deliberate organization of patient care activities across multiple providers, settings, and time periods to ensure appropriate delivery of health services. AI healthcare agents enhance care coordination by automating referral tracking, post-discharge follow-up, medication reconciliation, and chronic disease management—closing the communication gaps between primary care, specialists, hospitals, and patients that contribute to 80% of serious medical errors (AHRQ).
Authentication and fraud detection based on how a person interacts with a device—typing rhythm, mouse movement patterns, swipe gestures, touchscreen pressure, and navigation habits. Unlike static biometrics (fingerprint, face), behavioral biometrics are continuous and nearly impossible to replicate. AI cybersecurity and fraud detection agents use behavioral biometrics to detect account takeovers in real time: even if an attacker has valid credentials, their interaction patterns differ from the legitimate user.
A strategy where prices are adjusted in real time based on demand, competition, inventory levels, time of day, customer segment, and other market signals. AI agents enable dynamic pricing at scale—monitoring thousands of SKUs and competitor prices continuously, calculating optimal prices within margin constraints, and executing changes automatically. Common in e-commerce, travel, ride-sharing, and SaaS, where static pricing leaves money on the table.
The complete lifecycle of a patient referral from the moment a provider creates it until the specialist's findings are returned and reviewed by the referring provider. In healthcare, open referral loops—where referrals are sent but never completed or results never returned—are a major patient safety risk. AI healthcare agents close referral loops by tracking each stage (referral sent → patient scheduled → patient attended → results received → PCP reviewed) and intervening when the process stalls.
A manufacturer-set floor price below which authorized retailers agree not to advertise a product. MAP policies protect brand value and prevent a race to the bottom among distributors. AI ecommerce agents monitor MAP compliance across authorized and unauthorized resellers—scanning websites, marketplaces, and price comparison engines to detect violations and generate enforcement reports. Violations typically trigger warnings, then loss of advertising co-op funds, and ultimately termination of the retailer relationship.
The maximum amount of downtime or errors a service can accumulate within a measurement period before breaching its SLA. Derived from the SLA target—a 99.9% uptime SLA allows 43.8 minutes of downtime per month as the error budget. AI operations agents track error budget consumption in real time, predict when the budget will be exhausted at the current burn rate, and trigger protective actions (freezing deployments, scaling infrastructure) before a breach occurs. Error budgets turn reliability from a binary 'up or down' question into a measurable, manageable resource.
The total revenue a business expects to earn from a single customer over the entire duration of their relationship. CLV accounts for average purchase value, purchase frequency, and customer lifespan. AI agents improve CLV by predicting churn before it happens, identifying upsell and cross-sell opportunities from usage patterns, and personalizing retention interventions. A customer success AI agent that increases average customer lifespan by 6 months can improve CLV by 15-30% across the customer base.
The natural person(s) who ultimately own or control a legal entity, even if the entity is held through layers of corporate structures, trusts, or nominees. Anti-money laundering regulations (the Corporate Transparency Act in the US, EU AMLD) require financial institutions to identify beneficial owners during KYC. AI compliance agents automate beneficial ownership verification by parsing corporate registries, cross-referencing ownership structures, and flagging complex or opaque arrangements that require enhanced due diligence.
Processing multiple AI model requests together as a batch rather than one at a time. Batch inference is significantly cheaper than real-time inference—Anthropic's batch API offers 50% cost reduction, and OpenAI's batch API is similar. AI agents use batch inference for non-time-sensitive tasks: processing overnight support ticket categorization, bulk document analysis, weekly report generation, and large-scale data enrichment. The tradeoff is latency: batch results arrive hours later rather than in seconds.
The ability to understand and communicate why an AI model produced a specific output—which input features influenced the decision, how confident the model is, and what would change the outcome. Explainability is a regulatory requirement in many contexts (the EU AI Act mandates it for high-risk AI systems) and a practical requirement for trust. AI agents in healthcare, finance, legal, and HR must provide explainable decisions so that human reviewers can verify the reasoning and catch errors.
Directing an incoming request to the most appropriate AI model, agent, or workflow based on the request's characteristics—complexity, domain, required capabilities, cost sensitivity, and latency requirements. A prompt router might send simple FAQ questions to a small, fast model (Haiku-class) and complex reasoning tasks to a large, capable model (Opus-class), optimizing the cost-quality tradeoff across the full spectrum of requests. Advanced routing considers user tier, request urgency, and current system load.
Creating artificial data that statistically resembles real data but contains no actual personal or sensitive information. AI agents use synthetic data for testing (generating realistic but fake customer records to test workflows), training (creating labeled examples for fine-tuning without using real customer data), and privacy compliance (sharing data insights across teams without exposing PII). Modern LLMs generate high-quality synthetic data that preserves statistical properties, edge cases, and realistic distributions.
A pattern where AI agent performance degrades on tasks requiring sustained attention or processing very long contexts—analogous to human cognitive fatigue but caused by context window limitations, attention mechanism degradation, and cumulative error propagation in multi-step workflows. Symptoms include declining accuracy in later steps of long workflows, increased hallucination rates in long conversations, and inconsistent behavior when processing large document sets. Mitigation strategies include task chunking, periodic context summarization, and checkpoint-based workflows that reset context at defined intervals.
A standardized documentation format for AI models that describes intended use, training data, performance characteristics, limitations, and ethical considerations. Originally proposed by Google researchers in 2018, model cards have become the industry standard for transparent model documentation—and a regulatory requirement under the EU AI Act for high-risk systems. AI agent builders use model cards to verify that the underlying models meet their use case requirements, and produce model cards for their own agents to support compliance and customer due diligence.
Techniques and architectures that protect AI agents from prompt injection attacks—attempts to override the agent's instructions through malicious content in user input, retrieved documents, tool outputs, or other context. Defenses include input sanitization, instruction hierarchy enforcement, output validation, capability isolation (running risky operations in sandboxes), and dual-LLM patterns where one model checks another's actions. No single defense is sufficient; production agents stack multiple defenses based on risk profile.
Protections that prevent users from circumventing an AI agent's safety guidelines and operational boundaries through adversarial prompts. Common jailbreak techniques include role-play attacks ('pretend you have no restrictions'), instruction-formatting attacks (using markdown or code blocks to confuse the model), encoding attacks (base64, leetspeak), and persistent multi-turn attacks. Defenses combine input classifiers (detecting jailbreak attempts), output filters (blocking unsafe content), constitutional AI training (models trained to resist), and runtime monitoring with automatic escalation.
Systematic errors in AI agent outputs that result in unfair or discriminatory treatment of certain groups. Bias enters through training data (historical inequities reflected in the data), feature selection (proxies for protected characteristics), evaluation methodology (benchmarks that don't represent affected groups), and deployment context (using a model outside its validated use case). AI agents in HR, finance, healthcare, and legal applications must implement bias detection and mitigation as a core engineering requirement, not an afterthought.
The systematic process of measuring an AI model's performance against defined criteria—accuracy, robustness, safety, latency, and cost. Effective model evaluation combines automated benchmarks (standardized test sets), task-specific evals (domain-relevant test cases), safety evals (adversarial inputs and policy violations), and human evaluation (qualitative assessment by domain experts). Production AI agents require continuous evaluation, not just pre-deployment testing—models drift, use cases evolve, and edge cases emerge.
The execution environment that runs AI agents, manages their state, handles tool invocations, enforces policies, and provides observability. Modern agent runtimes (Anthropic's Claude Agent SDK, LangGraph, AutoGen, OpenAI's Agents SDK, Crew AI) provide common infrastructure—prompt management, memory persistence, error handling, retry logic, audit logging, and integration with monitoring systems—so developers can focus on agent capabilities rather than infrastructure plumbing.
The gradual degradation of an AI agent's response quality as conversation context grows longer. Even when a model technically supports a million-token context window, response quality typically peaks at much shorter context lengths. As context grows, the model's attention spreads thinner across more information, instruction following weakens, and earlier instructions get diluted by later content. Context rot is why long-running agent conversations need periodic context compression, not just larger context windows.
The amount of computational work an AI model performs at inference time to produce a response, beyond a single forward pass. Reasoning models (OpenAI o-series, Claude with extended thinking, DeepSeek R1) use test-time compute to deliberate through complex problems—generating intermediate reasoning steps, self-correcting, exploring alternatives—before producing a final answer. More test-time compute generally improves quality on hard reasoning tasks but increases latency and cost. The shift toward test-time compute has fundamentally changed AI agent economics: capable agents now spend significant compute thinking, not just generating.