Overview
Schema extraction allows you to define the exact structure of data you want to extract from documents. Instead of getting raw text, you receive structured JSON that matches your specified schema, making it perfect for automated workflows and database integration.How It Works
1
Define Schema
Create a JSON schema that describes your desired output structure
2
Send Request
Include the schema with your extraction request
3
AI Processing
Our AI analyzes the document and maps content to your schema
4
Receive Structured Data
Get back clean, structured JSON matching your schema
Basic Schema Format
Define your schema as a JSON object with field names and data types:Supported Data Types
string
Text values of any length
integer
Whole numbers without decimals
float
Numbers with decimal places
date
Date values in various formats
boolean
True/false values
array
Lists of items
Complex Schema Examples
Invoice Processing
Extract detailed invoice information:Contract Analysis
Extract key contract terms:Medical Records
Extract patient information:Advanced Features
Optional Fields
Make fields optional by using null as an alternative type:Schema Prompts
Provide additional context for better extraction:Nested Objects
Create deeply nested structures:Dynamic Arrays
Extract variable-length lists:Best Practices
Keep Schemas Simple
Keep Schemas Simple
- Start with basic schemas and add complexity gradually
- Use clear, descriptive field names
- Avoid deeply nested structures when possible
- Test with sample documents first
Handle Missing Data
Handle Missing Data
- Make optional fields nullable
- Provide default values in your application
- Use validation after extraction
- Consider partial extraction strategies
Optimize for Performance
Optimize for Performance
- Extract only needed fields
- Use specific data types (not just “string”)
- Process pages selectively with
pages
parameter - Consider chunking for very large schemas
Improve Accuracy
Improve Accuracy
- Use descriptive field names that match document terminology
- Provide context with schema prompts
- Validate critical data points