Structured Output#
What it is#
Structured output is the discipline of getting a language model to emit data in a machine-parseable shape — JSON, XML, a function-call signature — instead of free prose. It’s the seam between an LLM and the rest of your application. Done badly, you sprinkle regex everywhere and field-rename incidents leak silent bugs. Done well, every model output passes through a typed validator (Pydantic, Zod, JSON Schema) and your application code never touches an Any. The four techniques in this article — schema-constrained tool calls, JSON mode, prompt-only structuring with prefill, and validator-paired retry loops — cover essentially every case in production.
The reliability spectrum#
Different techniques offer different guarantees. Listed in order of increasing reliability — pick the lightest technique that meets your need.
| Technique | Schema guaranteed? | Format guaranteed? | Cost | Use when |
|---|---|---|---|---|
| Plain prose + regex | No | No | Low | One-off extraction, tolerate errors |
| Instruction “return JSON” | No | Mostly | Low | Prototype |
Prefill { + stop seq | No | Yes (parses as JSON) | Low | Lightweight, no schema needed |
| Tool use with input_schema | Yes (Claude) | Yes | Low | Production extraction |
| JSON mode (OpenAI/some APIs) | Yes (if schema set) | Yes | Low | OpenAI compatibility |
| Outlines / Guidance / LM Format | Yes | Yes | Medium | Open-weights; constrained decoding |
| Validator + retry loop | Eventually | Eventually | Medium | Safety net on any of the above |
[!TIP] For Claude, the most reliable schema-guaranteed approach is tool use with
tool_choice={"type": "tool", "name": "..."}. It returns a parsed Python dict matching theinput_schemaexactly. For everything else, prefer pairing your generation method with a typed validator and a one-shot retry.
Tool use as a structured output channel#
Claude’s tool-use API was designed for function calling, but it doubles as the most reliable structured-output mechanism. You define a tool whose input_schema IS your target output shape, force Claude to call it via tool_choice, and read the parsed input back from the response. No regex, no JSON parsing, no escape-string headaches.
import anthropic
client = anthropic.Anthropic()
extract_tool = {
"name": "store_invoice",
"description": "Store the structured invoice data extracted from the document.",
"input_schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string", "description": "ISO 8601 date"},
"vendor_name": {"type": "string"},
"total_amount": {"type": "number"},
"currency": {
"type": "string",
"enum": ["USD", "EUR", "GBP", "CAD", "JPY"],
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"unit_price": {"type": "number"},
},
"required": ["description", "quantity", "unit_price"],
},
},
},
"required": [
"invoice_number", "date", "vendor_name", "total_amount",
"currency", "line_items",
],
},
}
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "store_invoice"},
messages=[{"role": "user", "content": f"Extract this invoice:\n\n{invoice_text}"}],
)
tool_use_block = next(b for b in response.content if b.type == "tool_use")
print(tool_use_block.input)
Output:
{'invoice_number': 'INV-2025-0427', 'date': '2025-04-27', 'vendor_name': 'Acme Coffee',
'total_amount': 42.50, 'currency': 'USD',
'line_items': [{'description': 'Espresso', 'quantity': 2, 'unit_price': 4.50}, ...]}
[!TIP] The “tool” never has to actually do anything. It exists purely to define the schema and capture the output. Many teams name it
store_X,record_X, orsubmit_Xto signal that it’s a sink rather than a callable.
Pydantic-paired validation#
A typed validator turns the model’s dict into an instance of a class — catching shape mismatches, type coercions, and missing fields with a clean traceback. Pydantic is the de-facto Python validator; the same pattern works with attrs, marshmallow, or dataclasses-json.
from pydantic import BaseModel, Field, ValidationError
from typing import Literal
from datetime import date
class LineItem(BaseModel):
description: str
quantity: float = Field(ge=0)
unit_price: float = Field(ge=0)
class Invoice(BaseModel):
invoice_number: str = Field(min_length=1)
date: date
vendor_name: str
total_amount: float = Field(ge=0)
currency: Literal["USD", "EUR", "GBP", "CAD", "JPY"]
line_items: list[LineItem] = Field(min_length=1)
def extract_invoice(text: str) -> Invoice:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "store_invoice"},
messages=[{"role": "user", "content": f"Extract this invoice:\n\n{text}"}],
)
tool_use = next(b for b in response.content if b.type == "tool_use")
return Invoice.model_validate(tool_use.input)
[!TIP] Generate the tool schema FROM the Pydantic model using
Invoice.model_json_schema()to keep them in sync. Pydantic’s output is a valid JSON Schema; lightly trim fields liketitleand$defsthat Claude doesn’t need.
Generate schema from Pydantic#
def pydantic_to_tool(model_class, tool_name: str, description: str) -> dict:
schema = model_class.model_json_schema()
# Inline $ref / $defs if present (simplified — production version handles nesting)
return {
"name": tool_name,
"description": description,
"input_schema": schema,
}
invoice_tool = pydantic_to_tool(
Invoice,
tool_name="store_invoice",
description="Store structured invoice data extracted from the document.",
)
TypeScript with Zod#
The Zod equivalent of the Pydantic pattern. Zod schemas can be converted to JSON Schema via zod-to-json-schema and fed to Claude as a tool input schema.
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const LineItem = z.object({
description: z.string(),
quantity: z.number().nonnegative(),
unit_price: z.number().nonnegative(),
});
const Invoice = z.object({
invoice_number: z.string().min(1),
date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
vendor_name: z.string(),
total_amount: z.number().nonnegative(),
currency: z.enum(["USD", "EUR", "GBP", "CAD", "JPY"]),
line_items: z.array(LineItem).min(1),
});
const client = new Anthropic();
async function extractInvoice(text: string): Promise<z.infer<typeof Invoice>> {
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 2048,
tools: [
{
name: "store_invoice",
description: "Store structured invoice data.",
input_schema: zodToJsonSchema(Invoice) as Anthropic.Tool.InputSchema,
},
],
tool_choice: { type: "tool", name: "store_invoice" },
messages: [{ role: "user", content: `Extract this invoice:\n\n${text}` }],
});
const toolUse = response.content.find((b) => b.type === "tool_use");
if (toolUse?.type !== "tool_use") throw new Error("No tool use in response");
return Invoice.parse(toolUse.input);
}
JSON mode (prompt-only structuring)#
When the API doesn’t have a dedicated tool-use channel — or you want zero schema bookkeeping — instruct the model to return JSON and combine three techniques to maximize reliability: explicit schema in the prompt, prefill with {, and stop sequence after the closing brace.
SCHEMA_HINT = """
{
"category": "billing" | "access" | "performance" | "bug" | "feature" | "other",
"priority": "P1" | "P2" | "P3" | "P4",
"summary": "<one sentence>",
"needs_human": true | false
}
"""
def triage(ticket: str) -> dict:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=400,
stop_sequences=["\n\n", "}\n"],
messages=[
{
"role": "user",
"content": (
f"Classify the support ticket. Output JSON matching this schema:\n"
f"{SCHEMA_HINT}\n"
f"Output ONLY the JSON object, no fences, no prose.\n\n"
f"Ticket: {ticket}"
),
},
{"role": "assistant", "content": "{"}, # prefill ensures it starts with JSON
],
)
raw = "{" + response.content[0].text
if not raw.rstrip().endswith("}"):
raw += "}"
return json.loads(raw)
[!WARNING] Prompt-only JSON is reliable on Opus and Sonnet but degrades on smaller models. If you can use tool use, you should. Reach for prompt-only JSON when you need cross-model portability or when adding tools is not worth the operational overhead.
Validator + retry loop#
Even with tool use, validation occasionally fails — the model might omit a required field, return a number when you expected a string, or violate a regex pattern. The retry loop pattern catches the validation error and feeds it back as a corrective message. Cap at 2 retries; beyond that, fail loudly and queue for human review.
import json
import logging
from pydantic import ValidationError
def extract_with_validation(
text: str,
model_class: type[BaseModel],
tool: dict,
max_retries: int = 2,
) -> BaseModel | None:
messages = [{"role": "user", "content": f"Extract:\n\n{text}"}]
for attempt in range(max_retries + 1):
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[tool],
tool_choice={"type": "tool", "name": tool["name"]},
messages=messages,
)
tool_use = next((b for b in resp.content if b.type == "tool_use"), None)
if not tool_use:
return None
try:
return model_class.model_validate(tool_use.input)
except ValidationError as e:
logging.warning(f"Attempt {attempt + 1} failed: {e}")
# Feed the error back and ask the model to retry
messages.append({"role": "assistant", "content": resp.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": (
f"Validation failed: {e.errors()[:3]}. "
f"Call the tool again with corrected input."
),
"is_error": True,
}],
})
return None # exhausted retries
Streaming structured output#
When latency matters, stream the tool-call JSON as it’s generated. The SDK exposes input_json_delta events with the partial JSON; combine them into a string and parse at message_stop.
import json
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "store_invoice"},
messages=[{"role": "user", "content": f"Extract: {text}"}],
) as stream:
partial_json = ""
for event in stream:
if event.type == "content_block_delta" and event.delta.type == "input_json_delta":
partial_json += event.delta.partial_json
# Optionally: try to incrementally parse with a tolerant parser
final = stream.get_final_message()
tool_use = next(b for b in final.content if b.type == "tool_use")
parsed = tool_use.input
print(parsed)
[!TIP] Use a tolerant streaming JSON parser (like
partial-jsonon PyPI) if you need to render partial state to the user as the tool call streams in. Useful for live UI updates with progressive disclosure.
Schema design principles#
The shape of your schema affects both LLM accuracy and downstream usability. Five rules that hold across providers:
☐ Use enums for fixed value sets — eliminates hallucinated values
☐ Mark every truly-required field as `required` — Claude respects this
☐ Use descriptive field names, not abbreviations (currency_code, not ccy)
☐ Add a description to every field — Claude reads them; defaults are weak signals
☐ Prefer flat structures over deeply nested — easier to validate and debug
☐ Use null for "missing" rather than empty strings — distinguishes "not present" from ""
☐ Avoid free-text fields when an enum will do — extraction quality improves dramatically
Enum constraints#
"properties": {
"priority": {
"type": "string",
"enum": ["P1", "P2", "P3", "P4"],
"description": "Severity: P1=outage, P2=blocked, P3=degraded, P4=question"
},
"sentiment": {
"type": "string",
"enum": ["angry", "frustrated", "neutral", "positive"],
"description": "Tone of the customer"
}
}
Discriminated unions#
When a field can take several shapes, model it as a discriminated union — a type field that selects the shape of the rest. Claude understands these natively when the discriminator is an enum.
"properties": {
"payment": {
"type": "object",
"properties": {
"type": {"type": "string", "enum": ["card", "bank_transfer", "crypto"]},
"last4": {"type": "string", "description": "Required if type=card"},
"iban": {"type": "string", "description": "Required if type=bank_transfer"},
"wallet_address": {"type": "string", "description": "Required if type=crypto"},
},
"required": ["type"]
}
}
Optional vs required#
"required": ["invoice_number", "date", "total_amount"] # always present
# email, phone, address are NOT in required → may be omitted or null
[!TIP] If
nullis a valid value for an optional field, document it explicitly:"description": "Email if provided, else null". Claude will usenullover making something up.
Open-weights constrained decoding#
For models you run yourself (Llama, Qwen, DeepSeek), libraries like Outlines, Guidance, and LM Format Enforcer constrain generation at the token level — the model literally cannot emit a token that would break the schema. This gives 100% format conformance at the cost of slightly slower decoding.
# Outlines example (open-weights only)
from outlines import models, generate
from pydantic import BaseModel
from typing import Literal
class Ticket(BaseModel):
category: Literal["billing", "access", "bug", "feature", "other"]
priority: Literal["P1", "P2", "P3", "P4"]
summary: str
needs_human: bool
model = models.transformers("meta-llama/Llama-3-8B-Instruct")
generator = generate.json(model, Ticket)
result = generator("Customer reports: 'Page took 30 seconds to load.'")
print(result)
Output:
Ticket(category='bug', priority='P3', summary='Page load latency 30s', needs_human=False)
[!TIP] Constrained decoding does NOT replace prompt engineering — a poorly described task still produces semantically wrong outputs, just well-formatted ones. Use it as a guarantee on shape, not on correctness.
Nested schemas#
Deeply nested schemas are harder for the model to populate correctly. Two mitigations: keep depth ≤ 3 levels where possible, and break large schemas into smaller calls when you can. A schema with 30+ leaf fields produces noticeably worse extraction than the same 30 fields split across 3 sequential calls.
# Heavy schema — works but error-prone
{
"company": {
"name": "...",
"address": {
"street": "...",
"city": "...",
"country": "...",
"geo": {"lat": 0.0, "lng": 0.0}
},
"contacts": [{"name": "...", "role": "...", "email": "..."}]
},
"invoice": {...},
"line_items": [...],
}
# Better — three sequential calls, each with a focused schema
company = extract_with_schema(text, CompanySchema)
invoice = extract_with_schema(text, InvoiceSchema)
items = extract_with_schema(text, LineItemsSchema)
Handling lists of unknown length#
When the output is a list, include guidance on size bounds and what to do when nothing is found. Without it, models sometimes invent entries to fill perceived expectations.
"properties": {
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Topic tags from the article. Return empty array if no clear tags. Max 10."
}
}
Error patterns and fixes#
The repeated mistakes that show up when wiring up structured output, with the fix for each.
| Symptom | Root cause | Fix |
|---|---|---|
JSONDecodeError: Expecting value | Model wrapped JSON in ```fences | Prefill { + stop sequence on ``` |
| Field is string when schema says number | Schema description vague | Add "type": "number", "description": "Numeric. No commas, no $." |
| Empty array when items exist | Schema didn’t say to extract them | Description: “Extract every line item; do not skip” |
| Extra fields not in schema | LLM padded with extras | Pydantic model_config = ConfigDict(extra="forbid") |
| Required field missing | Long input, model lost it | Re-prompt with retry loop OR shorten input |
| Hallucinated enum value | Free-text field where enum would do | Convert to enum |
| Numbers come back as strings | Auto-formatting | Pydantic does coercion; or strict mode + retry |
| Date format inconsistent | No format hint | Add "description": "ISO 8601 YYYY-MM-DD" |
| Nested object missing levels | Schema too deep | Flatten or split into sequential calls |
| Different field order each run | Insertion-order varies | Order shouldn’t matter — use dict access, not list index |
Common pitfalls#
| Pitfall | Why it bites | Fix |
|---|---|---|
| Trusting raw model output without validation | Field types drift silently | Always pipe through Pydantic / Zod |
| Validator and prompt schema out of sync | Edits in one place, not the other | Generate tool schema FROM the model class |
| No retry on validation failure | One bad parse breaks the pipeline | One retry with the error fed back is cheap insurance |
extra="allow" in Pydantic | LLM-added fields leak through | extra="forbid" in production |
| Schema hidden in prompt as prose | LLM has to re-derive structure each time | Use tool input_schema; cheaper, more reliable |
| Streaming partial JSON to user | Mid-stream parse errors | Buffer until stop event, then parse-and-render |
| Same schema for every model size | Smaller models drop fields | Validate against the WEAKEST model you support |
Ignoring is_error flag in tool_result | Retries don’t carry error signal | Set is_error: true so model knows to fix |
Real-world recipes#
Compact end-to-end patterns for the highest-leverage structured-output use cases.
CRUD payload generation#
class UserUpdate(BaseModel):
email: str | None = None
name: str | None = None
plan: Literal["free", "pro", "enterprise"] | None = None
notify_billing: bool | None = None
update_tool = pydantic_to_tool(
UserUpdate,
tool_name="propose_user_update",
description="Generate a partial user-update payload from a natural-language command.",
)
def parse_user_command(text: str) -> UserUpdate:
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=600,
tools=[update_tool],
tool_choice={"type": "tool", "name": "propose_user_update"},
messages=[{
"role": "user",
"content": (
f"Convert this admin command into a user-update payload. "
f"Set fields to null when not mentioned.\n\n{text}"
),
}],
)
return UserUpdate.model_validate(
next(b for b in resp.content if b.type == "tool_use").input
)
payload = parse_user_command("Upgrade alice@example.com to Pro and turn off billing emails.")
# UserUpdate(email='alice@example.com', name=None, plan='pro', notify_billing=False)
Multi-entity extraction#
class Person(BaseModel):
name: str
role: str | None = None
affiliation: str | None = None
class Event(BaseModel):
name: str
date: str | None = None
location: str | None = None
class Extraction(BaseModel):
people: list[Person] = Field(default_factory=list)
events: list[Event] = Field(default_factory=list)
organizations: list[str] = Field(default_factory=list)
extract_tool = pydantic_to_tool(
Extraction,
tool_name="record_entities",
description="Record all named entities found in the document.",
)
def extract_entities(text: str) -> Extraction:
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "record_entities"},
messages=[{"role": "user", "content": text}],
)
return Extraction.model_validate(
next(b for b in resp.content if b.type == "tool_use").input
)
Function-call dispatch (router)#
class WeatherArgs(BaseModel):
location: str
units: Literal["celsius", "fahrenheit"] = "celsius"
class StockArgs(BaseModel):
ticker: str
class CalendarArgs(BaseModel):
person: str
day: str
ROUTES = {
"get_weather": (WeatherArgs, get_weather),
"get_stock_price": (StockArgs, get_stock_price),
"check_calendar": (CalendarArgs, check_calendar),
}
def route(user_msg: str) -> Any:
tools = [
{"name": name, "description": fn.__doc__, "input_schema": cls.model_json_schema()}
for name, (cls, fn) in ROUTES.items()
]
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=400,
tools=tools,
tool_choice={"type": "any"}, # must call exactly one tool
messages=[{"role": "user", "content": user_msg}],
)
tool_use = next(b for b in resp.content if b.type == "tool_use")
cls, fn = ROUTES[tool_use.name]
args = cls.model_validate(tool_use.input)
return fn(**args.model_dump())
Form-fill from messy text#
class JobApplication(BaseModel):
full_name: str
email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
years_experience: int = Field(ge=0, le=80)
desired_salary_usd: int | None = None
open_to_remote: bool
application_tool = pydantic_to_tool(
JobApplication,
tool_name="submit_application",
description="Submit the parsed job application after extracting fields from the resume.",
)
def parse_application(resume_text: str) -> JobApplication:
return extract_with_validation(resume_text, JobApplication, application_tool)
Quick reference#
Pick the right tool for the job.
| Need | First technique |
|---|---|
| Hard schema guarantee, Claude | Tool use + tool_choice={"type": "tool"} |
| Cross-model portability | Prompt + prefill { + stop sequence |
| Open-weights model, must conform | Outlines / Guidance / LM Format Enforcer |
| Lazy/optional fields | Optional in Pydantic, NOT in required |
| Closed value set | Enum in schema |
| Mutually exclusive shapes | Discriminated union (type field) |
| Inconsistent dates | description: ISO 8601 YYYY-MM-DD |
| Catch bad outputs after generation | Pydantic / Zod + retry loop |
| Schema lives next to types | pydantic_to_tool(MyModel, ...) |
| Streaming for UX | input_json_delta events + buffered parse |
| Pick exactly one of many actions | Tools + tool_choice={"type": "any"} |
| Extract many entities at once | Single tool with list[T] fields |