Structured Output for AI Agents: JSON Mode, Function Calling, and Reliable Agent-to-System Communication
March 22, 2026
By AgentMelt Team
The most common failure mode in production AI agents is not bad reasoning. It is good reasoning with bad output formatting. The agent correctly determines that a customer needs a refund, but it returns the amount as "$45.00" instead of 45.00, and the downstream system crashes. Structured output techniques eliminate this entire category of failure by guaranteeing that the LLM's response conforms to a predefined schema.
Why unstructured output breaks agents
When an AI agent needs to take action in the real world, its output must be machine-readable. Consider an agent that processes support tickets and returns an action:
Unstructured output (fragile):
I'll categorize this as a billing issue and assign it to the finance team with high priority.
Your code now needs to parse natural language to extract the category, team, and priority. Regex breaks when the phrasing changes. LLM-based parsing adds latency and cost. Edge cases multiply: what if the agent says "urgent" instead of "high priority"?
Structured output (reliable):
{
"category": "billing",
"assigned_team": "finance",
"priority": "high",
"action": "escalate"
}
Your code reads the JSON, validates the fields, and executes. No parsing ambiguity. No missed edge cases. The schema is the contract between your agent and your system.
In production, agents that use structured output have 95-99% action success rates compared to 70-85% for agents that rely on parsing unstructured text. The remaining failures come from reasoning errors, not formatting errors.
JSON mode: the simplest approach
Most LLM providers offer a JSON mode that constrains the model's output to valid JSON. This is the easiest way to get structured output.
OpenAI: Set response_format: { type: "json_object" } in your API call. The model will always return valid JSON. You must also instruct the model in the system prompt to produce JSON, as the mode only guarantees valid JSON syntax, not a specific schema.
Anthropic: Claude supports JSON output through clear prompting. While Claude does not have a dedicated JSON mode parameter, it reliably produces valid JSON when instructed to do so, especially when given a schema. For guaranteed structure, use tool use (function calling) instead.
Google Gemini: Set response_mime_type: "application/json" and optionally provide a response_schema. Gemini will constrain output to match the schema.
Limitations of JSON mode: It guarantees valid JSON but not schema conformance. The model might return {"answer": "yes"} when you expected {"approved": true, "reason": "..."}. For schema enforcement, you need function calling or post-validation.
Function calling: schema-enforced output
Function calling (also called tool use) is the most reliable way to get structured output from an LLM. You define a function schema, and the model returns a structured call to that function with arguments matching your schema.
How it works for structured output:
tools = [{
"type": "function",
"function": {
"name": "process_ticket",
"description": "Process a support ticket with categorization and routing",
"parameters": {
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": ["billing", "technical", "account", "feature_request"]
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high", "critical"]
},
"assigned_team": {"type": "string"},
"summary": {"type": "string", "maxLength": 200},
"requires_human": {"type": "boolean"}
},
"required": ["category", "priority", "assigned_team", "summary", "requires_human"]
}
}
}]
The model returns a function call with arguments that match this schema. The enum fields constrain values to your allowed set. required ensures no fields are missing. The LLM provider validates the output against the schema before returning it.
OpenAI strict mode: Set strict: true on function definitions. OpenAI guarantees the output matches your JSON Schema exactly, including required fields, types, and enum values. This adds a small amount of latency on the first call (schema compilation) but provides mathematical guarantees on output structure.
Anthropic tool use: Define tools with input schemas. Claude produces structured tool calls that conform to the schema. Claude's tool use is particularly reliable even with complex nested schemas.
Pydantic validation: the Python developer's best friend
For Python-based agents, Pydantic provides the most developer-friendly way to define, validate, and use structured output.
Define your schema as a Pydantic model:
from pydantic import BaseModel, Field
from enum import Enum
class Priority(str, Enum):
low = "low"
medium = "medium"
high = "high"
critical = "critical"
class TicketAction(BaseModel):
category: str = Field(description="Ticket category")
priority: Priority
assigned_team: str = Field(description="Team to handle the ticket")
summary: str = Field(max_length=200)
requires_human: bool
confidence: float = Field(ge=0.0, le=1.0)
Use with OpenAI:
completion = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[...],
response_format=TicketAction
)
ticket = completion.choices[0].message.parsed # Returns a TicketAction instance
Use with LangChain: LangChain's with_structured_output() method works across providers:
llm = ChatOpenAI(model="gpt-4o")
structured_llm = llm.with_structured_output(TicketAction)
result = structured_llm.invoke("Classify this ticket: ...")
# result is a TicketAction instance
Use with Instructor: The Instructor library patches OpenAI and Anthropic clients to return Pydantic models directly:
import instructor
client = instructor.from_openai(OpenAI())
ticket = client.chat.completions.create(
model="gpt-4o",
response_model=TicketAction,
messages=[...]
)
# ticket is a validated TicketAction instance with automatic retries on validation failure
Instructor is particularly valuable because it automatically retries when validation fails, passing the validation error back to the model so it can self-correct.
Common patterns for agent-to-system communication
Pattern 1: Action selection. The agent chooses from a predefined set of actions. Use an enum field to constrain the options.
class AgentDecision(BaseModel):
action: Literal["respond", "escalate", "close", "transfer"]
target: str | None # team or person for escalate/transfer
message: str # response text or escalation reason
Pattern 2: Multi-step planning. The agent produces a plan with ordered steps. Use a list of structured step objects.
class Step(BaseModel):
tool: str
arguments: dict
depends_on: list[int] = [] # indices of prerequisite steps
class Plan(BaseModel):
steps: list[Step]
reasoning: str
Pattern 3: Extraction with confidence. The agent extracts information and indicates how confident it is. Low-confidence extractions can be routed for human review.
class ExtractedField(BaseModel):
value: str
confidence: float = Field(ge=0.0, le=1.0)
source: str # which part of the input this came from
class DocumentExtraction(BaseModel):
customer_name: ExtractedField
account_number: ExtractedField
issue_type: ExtractedField
Pattern 4: Guardrailed response. The agent produces both a response and metadata about the response for monitoring and safety.
class GuardrailedResponse(BaseModel):
response_text: str
contains_pii: bool
sentiment: Literal["positive", "neutral", "negative"]
topics: list[str]
escalation_recommended: bool
Error handling and fallbacks
Even with structured output, failures happen. Build resilience into your agent pipeline:
Validation retries. When the LLM output fails schema validation, retry with the error message included in the prompt. Most models self-correct after seeing the specific validation failure. Limit to 2-3 retries to avoid infinite loops.
Graceful degradation. If structured output consistently fails for a particular input (unusual language, adversarial input, extremely long text), fall back to a simpler schema or route to human review rather than crashing.
Schema versioning. When you update your output schema, version it. Run both old and new schemas during migration to ensure backward compatibility with downstream systems that consume the agent's output.
Monitoring. Track validation failure rates per schema, per model, and per input type. A sudden spike in failures usually indicates a prompt change, model update, or new input pattern that your schema does not handle.
Performance considerations
Structured output adds minimal overhead:
- JSON mode: No measurable latency increase. Slight reduction in output tokens since the model skips prose.
- Function calling: 50-200ms additional latency for schema processing on the first call. Negligible on subsequent calls.
- Pydantic validation: Under 1ms for typical schemas. Free in practice.
- Retries on validation failure: Each retry costs a full LLM call. Keep retry rates under 5% through good schema design and clear prompting.
The net effect is usually positive: structured output reduces total tokens generated (no prose wrapper around the data) and eliminates the need for a separate parsing step.
For prompt engineering best practices that complement structured output, see AI Agent Prompt Engineering Tips. For testing and evaluation strategies, read AI Agent Evaluation and Testing. Explore the full AI Coding Agent niche for development tools and guides.