Structured Output for AI Agents: JSON Mode, Function Calling, and Reliable Agent-to-System Communication

TL;DR: Structured output makes AI agent responses machine-readable by enforcing a JSON schema the LLM must conform to — eliminating the parsing failures that cause 30–60% of production agent bugs. Use OpenAI Structured Outputs or function calling, Anthropic tool use, or Gemini JSON mode. Validate every response with Pydantic or Zod. Combined with retries on validation failure, this reduces downstream crashes by 90%+.

The most common failure mode in production AI agents is not bad reasoning. It is good reasoning with bad output formatting. The agent correctly determines that a customer needs a refund, but it returns the amount as "$45.00" instead of 45.00, and the downstream system crashes. Structured output techniques eliminate this entire category of failure by guaranteeing that the LLM's response conforms to a predefined schema.

Why unstructured output breaks agents

When an AI agent needs to take action in the real world, its output must be machine-readable. Consider an agent that processes support tickets and returns an action:

Unstructured output (fragile):

I'll categorize this as a billing issue and assign it to the finance team with high priority.

Your code now needs to parse natural language to extract the category, team, and priority. Regex breaks when the phrasing changes. LLM-based parsing adds latency and cost. Edge cases multiply: what if the agent says "urgent" instead of "high priority"?

Structured output (reliable):

{
  "category": "billing",
  "assigned_team": "finance",
  "priority": "high",
  "action": "escalate"
}

Your code reads the JSON, validates the fields, and executes. No parsing ambiguity. No missed edge cases. The schema is the contract between your agent and your system.

In production, agents that use structured output have 95-99% action success rates compared to 70-85% for agents that rely on parsing unstructured text. The remaining failures come from reasoning errors, not formatting errors.

JSON mode: the simplest approach

Most LLM providers offer a JSON mode that constrains the model's output to valid JSON. This is the easiest way to get structured output.

OpenAI: Set response_format: { type: "json_object" } in your API call. The model will always return valid JSON. You must also instruct the model in the system prompt to produce JSON, as the mode only guarantees valid JSON syntax, not a specific schema.

Anthropic: Claude supports JSON output through clear prompting. While Claude does not have a dedicated JSON mode parameter, it reliably produces valid JSON when instructed to do so, especially when given a schema. For guaranteed structure, use tool use (function calling) instead.

Google Gemini: Set response_mime_type: "application/json" and optionally provide a response_schema. Gemini will constrain output to match the schema.

Limitations of JSON mode: It guarantees valid JSON but not schema conformance. The model might return {"answer": "yes"} when you expected {"approved": true, "reason": "..."}. For schema enforcement, you need function calling or post-validation.

Function calling: schema-enforced output

Function calling (also called tool use) is the most reliable way to get structured output from an LLM. You define a function schema, and the model returns a structured call to that function with arguments matching your schema.

How it works for structured output:

tools = [{
    "type": "function",
    "function": {
        "name": "process_ticket",
        "description": "Process a support ticket with categorization and routing",
        "parameters": {
            "type": "object",
            "properties": {
                "category": {
                    "type": "string",
                    "enum": ["billing", "technical", "account", "feature_request"]
                },
                "priority": {
                    "type": "string",
                    "enum": ["low", "medium", "high", "critical"]
                },
                "assigned_team": {"type": "string"},
                "summary": {"type": "string", "maxLength": 200},
                "requires_human": {"type": "boolean"}
            },
            "required": ["category", "priority", "assigned_team", "summary", "requires_human"]
        }
    }
}]

The model returns a function call with arguments that match this schema. The enum fields constrain values to your allowed set. required ensures no fields are missing. The LLM provider validates the output against the schema before returning it.

OpenAI strict mode: Set strict: true on function definitions. OpenAI guarantees the output matches your JSON Schema exactly, including required fields, types, and enum values. This adds a small amount of latency on the first call (schema compilation) but provides mathematical guarantees on output structure.

Anthropic tool use: Define tools with input schemas. Claude produces structured tool calls that conform to the schema. Claude's tool use is particularly reliable even with complex nested schemas.

Pydantic validation: the Python developer's best friend

For Python-based agents, Pydantic provides the most developer-friendly way to define, validate, and use structured output.

Define your schema as a Pydantic model:

from pydantic import BaseModel, Field
from enum import Enum

class Priority(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    critical = "critical"

class TicketAction(BaseModel):
    category: str = Field(description="Ticket category")
    priority: Priority
    assigned_team: str = Field(description="Team to handle the ticket")
    summary: str = Field(max_length=200)
    requires_human: bool
    confidence: float = Field(ge=0.0, le=1.0)

Use with OpenAI:

completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[...],
    response_format=TicketAction
)
ticket = completion.choices[0].message.parsed  # Returns a TicketAction instance

Use with LangChain: LangChain's with_structured_output() method works across providers:

llm = ChatOpenAI(model="gpt-4o")
structured_llm = llm.with_structured_output(TicketAction)
result = structured_llm.invoke("Classify this ticket: ...")
# result is a TicketAction instance

Use with Instructor: The Instructor library patches OpenAI and Anthropic clients to return Pydantic models directly:

import instructor
client = instructor.from_openai(OpenAI())
ticket = client.chat.completions.create(
    model="gpt-4o",
    response_model=TicketAction,
    messages=[...]
)
# ticket is a validated TicketAction instance with automatic retries on validation failure

Instructor is particularly valuable because it automatically retries when validation fails, passing the validation error back to the model so it can self-correct.

Common patterns for agent-to-system communication

Pattern 1: Action selection. The agent chooses from a predefined set of actions. Use an enum field to constrain the options.

class AgentDecision(BaseModel):
    action: Literal["respond", "escalate", "close", "transfer"]
    target: str | None  # team or person for escalate/transfer
    message: str  # response text or escalation reason

Pattern 2: Multi-step planning. The agent produces a plan with ordered steps. Use a list of structured step objects.

class Step(BaseModel):
    tool: str
    arguments: dict
    depends_on: list[int] = []  # indices of prerequisite steps

class Plan(BaseModel):
    steps: list[Step]
    reasoning: str

Pattern 3: Extraction with confidence. The agent extracts information and indicates how confident it is. Low-confidence extractions can be routed for human review.

class ExtractedField(BaseModel):
    value: str
    confidence: float = Field(ge=0.0, le=1.0)
    source: str  # which part of the input this came from

class DocumentExtraction(BaseModel):
    customer_name: ExtractedField
    account_number: ExtractedField
    issue_type: ExtractedField

Pattern 4: Guardrailed response. The agent produces both a response and metadata about the response for monitoring and safety.

class GuardrailedResponse(BaseModel):
    response_text: str
    contains_pii: bool
    sentiment: Literal["positive", "neutral", "negative"]
    topics: list[str]
    escalation_recommended: bool

Error handling and fallbacks

Even with structured output, failures happen. Build resilience into your agent pipeline:

Validation retries. When the LLM output fails schema validation, retry with the error message included in the prompt. Most models self-correct after seeing the specific validation failure. Limit to 2-3 retries to avoid infinite loops.

Graceful degradation. If structured output consistently fails for a particular input (unusual language, adversarial input, extremely long text), fall back to a simpler schema or route to human review rather than crashing.

Schema versioning. When you update your output schema, version it. Run both old and new schemas during migration to ensure backward compatibility with downstream systems that consume the agent's output.

Monitoring. Track validation failure rates per schema, per model, and per input type. A sudden spike in failures usually indicates a prompt change, model update, or new input pattern that your schema does not handle.

Performance considerations

Structured output adds minimal overhead:

JSON mode: No measurable latency increase. Slight reduction in output tokens since the model skips prose.
Function calling: 50-200ms additional latency for schema processing on the first call. Negligible on subsequent calls.
Pydantic validation: Under 1ms for typical schemas. Free in practice.
Retries on validation failure: Each retry costs a full LLM call. Keep retry rates under 5% through good schema design and clear prompting.

The net effect is usually positive: structured output reduces total tokens generated (no prose wrapper around the data) and eliminates the need for a separate parsing step.

For prompt engineering best practices that complement structured output, see AI Agent Prompt Engineering Tips. For testing and evaluation strategies, read AI Agent Evaluation and Testing. Explore the full AI Coding Agent niche for development tools and guides.

Why unstructured output breaks agents

When an AI agent needs to take action in the real world, its output must be machine-readable. Consider an agent that processes support tickets and returns an action:

Unstructured output (fragile):

I'll categorize this as a billing issue and assign it to the finance team with high priority.

Structured output (reliable):

{
  "category": "billing",
  "assigned_team": "finance",
  "priority": "high",
  "action": "escalate"
}

Your code reads the JSON, validates the fields, and executes. No parsing ambiguity. No missed edge cases. The schema is the contract between your agent and your system.

JSON mode: the simplest approach

Most LLM providers offer a JSON mode that constrains the model's output to valid JSON. This is the easiest way to get structured output.

Google Gemini: Set response_mime_type: "application/json" and optionally provide a response_schema. Gemini will constrain output to match the schema.

Function calling: schema-enforced output

How it works for structured output:

tools = [{
    "type": "function",
    "function": {
        "name": "process_ticket",
        "description": "Process a support ticket with categorization and routing",
        "parameters": {
            "type": "object",
            "properties": {
                "category": {
                    "type": "string",
                    "enum": ["billing", "technical", "account", "feature_request"]
                },
                "priority": {
                    "type": "string",
                    "enum": ["low", "medium", "high", "critical"]
                },
                "assigned_team": {"type": "string"},
                "summary": {"type": "string", "maxLength": 200},
                "requires_human": {"type": "boolean"}
            },
            "required": ["category", "priority", "assigned_team", "summary", "requires_human"]
        }
    }
}]

Anthropic tool use: Define tools with input schemas. Claude produces structured tool calls that conform to the schema. Claude's tool use is particularly reliable even with complex nested schemas.

Pydantic validation: the Python developer's best friend

For Python-based agents, Pydantic provides the most developer-friendly way to define, validate, and use structured output.

Define your schema as a Pydantic model:

from pydantic import BaseModel, Field
from enum import Enum

class Priority(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    critical = "critical"

class TicketAction(BaseModel):
    category: str = Field(description="Ticket category")
    priority: Priority
    assigned_team: str = Field(description="Team to handle the ticket")
    summary: str = Field(max_length=200)
    requires_human: bool
    confidence: float = Field(ge=0.0, le=1.0)

Use with OpenAI:

completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[...],
    response_format=TicketAction
)
ticket = completion.choices[0].message.parsed  # Returns a TicketAction instance

Use with LangChain: LangChain's with_structured_output() method works across providers:

llm = ChatOpenAI(model="gpt-4o")
structured_llm = llm.with_structured_output(TicketAction)
result = structured_llm.invoke("Classify this ticket: ...")
# result is a TicketAction instance

Use with Instructor: The Instructor library patches OpenAI and Anthropic clients to return Pydantic models directly:

import instructor
client = instructor.from_openai(OpenAI())
ticket = client.chat.completions.create(
    model="gpt-4o",
    response_model=TicketAction,
    messages=[...]
)
# ticket is a validated TicketAction instance with automatic retries on validation failure

Instructor is particularly valuable because it automatically retries when validation fails, passing the validation error back to the model so it can self-correct.

Common patterns for agent-to-system communication

Pattern 1: Action selection. The agent chooses from a predefined set of actions. Use an enum field to constrain the options.

class AgentDecision(BaseModel):
    action: Literal["respond", "escalate", "close", "transfer"]
    target: str | None  # team or person for escalate/transfer
    message: str  # response text or escalation reason

Pattern 2: Multi-step planning. The agent produces a plan with ordered steps. Use a list of structured step objects.

class Step(BaseModel):
    tool: str
    arguments: dict
    depends_on: list[int] = []  # indices of prerequisite steps

class Plan(BaseModel):
    steps: list[Step]
    reasoning: str

Pattern 3: Extraction with confidence. The agent extracts information and indicates how confident it is. Low-confidence extractions can be routed for human review.

class ExtractedField(BaseModel):
    value: str
    confidence: float = Field(ge=0.0, le=1.0)
    source: str  # which part of the input this came from

class DocumentExtraction(BaseModel):
    customer_name: ExtractedField
    account_number: ExtractedField
    issue_type: ExtractedField

Pattern 4: Guardrailed response. The agent produces both a response and metadata about the response for monitoring and safety.

class GuardrailedResponse(BaseModel):
    response_text: str
    contains_pii: bool
    sentiment: Literal["positive", "neutral", "negative"]
    topics: list[str]
    escalation_recommended: bool

Error handling and fallbacks

Even with structured output, failures happen. Build resilience into your agent pipeline:

Performance considerations

Structured output adds minimal overhead:

JSON mode: No measurable latency increase. Slight reduction in output tokens since the model skips prose.
Function calling: 50-200ms additional latency for schema processing on the first call. Negligible on subsequent calls.
Pydantic validation: Under 1ms for typical schemas. Free in practice.
Retries on validation failure: Each retry costs a full LLM call. Keep retry rates under 5% through good schema design and clear prompting.

The net effect is usually positive: structured output reduces total tokens generated (no prose wrapper around the data) and eliminates the need for a separate parsing step.

Structured Output for AI Agents: JSON Mode, Function Calling, and Reliable Agent-to-System Communication

Why unstructured output breaks agents

JSON mode: the simplest approach

Function calling: schema-enforced output

Pydantic validation: the Python developer's best friend

Common patterns for agent-to-system communication

Error handling and fallbacks

Performance considerations

Get the AI agent deployment checklist

Put this to work — which are you?

Related posts

Structured Output for AI Agents: JSON Mode, Function Calling, and Reliable Agent-to-System Communication

Why unstructured output breaks agents

JSON mode: the simplest approach

Function calling: schema-enforced output

Pydantic validation: the Python developer's best friend

Common patterns for agent-to-system communication

Error handling and fallbacks

Performance considerations

Get the AI agent deployment checklist

Put this to work — which are you?

Related posts