Overview
This guide covers best practices for designing JSON schemas used with the/schema endpoint (recommended) or the legacy structured_output parameter on /extract.
Schema Format
Thestructured_output.schema field uses the JSON Schema specification (OpenAPI 3.1 compatible). This is the same schema format used by OpenAI’s structured outputs and other LLM providers.
Key JSON Schema Properties
| Property | Description | Example |
|---|---|---|
type | Data type of the field | "string", "number", "boolean", "object", "array" |
properties | Define fields for an object | {"name": {"type": "string"}} |
items | Define schema for array elements | {"type": "object", "properties": {...}} |
required | List of required field names | ["name", "email"] |
description | Human-readable description to guide extraction | "Customer's full name" |
format | Hint for string formatting | "date", "email", "uri" |
Schema Editor (Recommended)
The Schema Editor provides two powerful ways to create schemas:1. Generate from Prompt
Describe what you want to extract in natural language, and the editor will generate a properly formatted JSON Schema for you.“Extract the account holder name, account number, statement period, opening and closing balances, and all transactions with date, description, and amount.”
2. Interactive Editor
- Visually add, remove, and reorder fields
- Set field types and descriptions
- Mark fields as required
- Preview the generated schema in real-time
- Test against sample documents
Using the /schema Endpoint (Recommended)
The recommended approach is a two-step flow:- Extract the document via
/extractto get anextraction_id - Apply a schema via
/schemausing theextraction_id
schema_config object contains:
| Field | Type | Description |
|---|---|---|
input_schema | object | JSON schema defining the structure of data to extract |
schema_prompt | string | Natural language instructions to guide extraction |
effort | boolean | Enable extended reasoning for complex documents |
Bank Statement Example
Here’s an example extracting key fields from a bank statement: Step 1: Extractschema_output):
SDK Examples
Schema Format
Schemas follow the JSON Schema specification. Each field is defined with:| Property | Description |
|---|---|
type | Data type: string, number, boolean, object, array |
description | Human-readable description to guide extraction |
format | Optional format hint (e.g., date, email, uri) |
required | Array of required field names (for objects) |
items | Schema for array elements |
properties | Nested field definitions (for objects) |
Data Types
| Type | Description | Example Value |
|---|---|---|
string | Text values | "John Doe" |
number | Numeric values (integer or decimal) | 99.99 |
boolean | True/false values | true |
object | Nested structures with properties | {"name": {"type": "string"}} |
array | Lists with items defining element schema | {"type": "array", "items": {...}} |
Schema Design Principles
1. Start Simple
Begin with basic fields and gradually add complexity:2. Use Descriptions
Adddescription fields to guide extraction:
3. Use schema_prompt for Context
Theschema_prompt field provides natural language guidance to help the model understand nuances:
Common Schema Patterns
Invoice / Financial Documents
Legal Documents
Medical Records
Advanced Techniques
Conditional Extraction
Useschema_prompt to guide conditional extraction:
Hierarchical Data
For documents with deeply nested structures:Performance Tips
Keep Schemas Focused
Extract only what you need. Avoid extracting entire documents as single fields.Use Descriptions
Adddescription fields to guide the model on ambiguous fields or specific formats.
Leverage schema_prompt
Useschema_prompt to provide context that can’t be expressed in the schema structure alone.
Migration from Legacy Parameters
Before (Deprecated — top-level schema on /extract)
Before (Deprecated — structured_output on /extract)
After (Recommended — /extract → /schema)
structured_output on /extract for backward compatibility, but all new integrations should use the /schema endpoint.
Error Handling
Common Schema Errors
| Error | Cause | Solution |
|---|---|---|
| Invalid JSON | Syntax error in schema | Validate JSON syntax |
| Unknown type | Using unsupported data type | Use string, number, boolean, or nested objects/arrays |
| Too complex | Deeply nested structure | Simplify schema, flatten where possible |
| No matches | Fields don’t match document | Adjust field names, use schema_prompt for guidance |
Debugging Tips
- Start with a minimal schema and add fields incrementally
- Use schema_prompt to provide context and clarify ambiguous fields
- Check extracted markdown without schema first to see available content
- Verify field names match document terminology
Best Practices Summary
DO
DO
- Use the
/schemaendpoint for all new integrations - Provide descriptive
schema_promptinstructions - Use descriptive field names matching document terminology
- Start simple and iterate
- Test with real documents
- Use appropriate data types (
numberfor numeric values)
DON'T
DON'T
- Use
structured_outputon/extract(deprecated — use/schemainstead) - Use deprecated
schematop-level parameter - Create overly complex nested structures
- Use generic field names
- Extract entire documents as single fields
- Assume all fields will always exist
Next Steps
Quickstart Guide
See more examples
Schema Endpoint
Apply schemas to extracted documents
