Overview
The structured_output parameter allows you to extract structured data from documents in a consistent format. This guide provides comprehensive guidelines for creating effective schemas.
Migration Notice : The structured_output parameter replaces the deprecated schema and schema_prompt top-level parameters. See Migration from Legacy Parameters for details.
The structured_output.schema field uses the JSON Schema specification (OpenAPI 3.1 compatible). This is the same schema format used by OpenAI’s structured outputs and other LLM providers.
Key JSON Schema Properties
Property Description Example typeData type of the field "string", "number", "boolean", "object", "array"propertiesDefine fields for an object {"name": {"type": "string"}}itemsDefine schema for array elements {"type": "object", "properties": {...}}requiredList of required field names ["name", "email"]descriptionHuman-readable description to guide extraction "Customer's full name"formatHint for string formatting "date", "email", "uri"
Schema Editor (Recommended)
Don’t write schemas by hand! Use the Schema Editor in the Pulse Platform to generate and refine schemas interactively.
The Schema Editor provides two powerful ways to create schemas:
1. Generate from Prompt
Describe what you want to extract in natural language, and the editor will generate a properly formatted JSON Schema for you.
“Extract the account holder name, account number, statement period, opening and closing balances, and all transactions with date, description, and amount.”
2. Interactive Editor
Visually add, remove, and reorder fields
Set field types and descriptions
Mark fields as required
Preview the generated schema in real-time
Test against sample documents
Once you’re happy with your schema, copy it directly into your API requests.
Using structured_output
The structured_output parameter is an object containing:
Field Type Description schemaobject JSON schema defining the structure of data to extract schema_promptstring Natural language instructions to guide extraction
Bank Statement Example
Here’s an example extracting key fields from a bank statement:
Request Schema:
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"account_holder" : {
"type" : "string" ,
"description" : "Name of the account holder"
},
"account_number" : {
"type" : "string" ,
"description" : "Bank account number"
},
"opening_balance" : {
"type" : "number" ,
"description" : "Balance at the start of the statement period"
},
"closing_balance" : {
"type" : "number" ,
"description" : "Balance at the end of the statement period"
}
},
"required" : [ "account_holder" , "account_number" , "opening_balance" , "closing_balance" ]
}
}
}
Response (structured_output field):
When you include a structured_output schema in your request, the API response will include a structured_output field containing the extracted data matching your schema:
{
"markdown" : "SAMPLE \n Statement of Account \n 12345678 \n\n JAMES C. MORRISON \n ..." ,
"page_count" : 2 ,
"structured_output" : {
"account_holder" : "JAMES C. MORRISON" ,
"account_number" : "12345678" ,
"opening_balance" : 69.96 ,
"closing_balance" : 586.71
},
"bounding_boxes" : { ... },
"plan-info" : { ... }
}
The structured_output object will always match the structure defined in your structured_output.schema request parameter.
curl Example
curl -X POST https://dev.api.runpulse.com/extract \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@bank_statement.pdf" \
-F 'structured_output={"schema": {"type": "object", "properties": {"account_holder": {"type": "string"}, "total": {"type": "number"}}, "required": ["account_holder"]}}'
Schemas follow the JSON Schema specification. Each field is defined with:
Property Description typeData type: string, number, boolean, object, array descriptionHuman-readable description to guide extraction formatOptional format hint (e.g., date, email, uri) requiredArray of required field names (for objects) itemsSchema for array elements propertiesNested field definitions (for objects)
Data Types
Type Description Example Value stringText values "John Doe"numberNumeric values (integer or decimal) 99.99booleanTrue/false values trueobjectNested structures with properties {"name": {"type": "string"}}arrayLists with items defining element schema {"type": "array", "items": {...}}
Schema Design Principles
1. Start Simple
Begin with basic fields and gradually add complexity:
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" },
"total" : { "type" : "number" }
},
"required" : [ "invoice_number" , "total" ]
}
}
}
Then expand with nested objects and arrays:
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" , "description" : "Invoice ID" },
"date" : { "type" : "string" , "format" : "date" },
"vendor" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"address" : { "type" : "string" }
}
},
"line_items" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"description" : { "type" : "string" },
"amount" : { "type" : "number" }
}
}
},
"total" : { "type" : "number" }
},
"required" : [ "invoice_number" , "total" ]
},
"schema_prompt" : "Extract all invoice details including vendor information and itemized charges."
}
}
2. Use Descriptions
Add description fields to guide extraction:
{
"properties" : {
"invoice_number" : {
"type" : "string" ,
"description" : "The unique invoice identifier, usually at the top of the document"
},
"bill_to" : {
"type" : "string" ,
"description" : "Customer billing address"
},
"remit_to" : {
"type" : "string" ,
"description" : "Payment remittance address"
}
}
}
3. Use schema_prompt for Context
The schema_prompt field provides natural language guidance to help the model understand nuances:
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"contract_type" : { "type" : "string" },
"effective_date" : { "type" : "string" , "format" : "date" },
"parties" : { "type" : "array" , "items" : { "type" : "string" }},
"key_terms" : { "type" : "array" , "items" : { "type" : "string" }}
}
},
"schema_prompt" : "Extract contract details. For key_terms, focus on payment terms, termination clauses, and liability limitations. Format dates as YYYY-MM-DD."
}
}
Common Schema Patterns
Invoice / Financial Documents
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"document_type" : { "type" : "string" },
"document_number" : { "type" : "string" },
"date" : { "type" : "string" , "format" : "date" },
"due_date" : { "type" : "string" , "format" : "date" },
"vendor" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"address" : { "type" : "string" },
"tax_id" : { "type" : "string" }
}
},
"customer" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"account_number" : { "type" : "string" }
}
},
"line_items" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"description" : { "type" : "string" },
"quantity" : { "type" : "number" },
"unit_price" : { "type" : "number" },
"amount" : { "type" : "number" }
},
"required" : [ "description" , "amount" ]
}
},
"subtotal" : { "type" : "number" },
"tax" : { "type" : "number" },
"total" : { "type" : "number" }
},
"required" : [ "document_number" , "total" ]
},
"schema_prompt" : "Extract all invoice details. Include all line items. Format currency as numbers without symbols."
}
}
Legal Documents
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"document_title" : { "type" : "string" },
"case_number" : { "type" : "string" },
"parties" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"role" : { "type" : "string" },
"representation" : { "type" : "string" }
},
"required" : [ "name" , "role" ]
}
},
"dates" : {
"type" : "object" ,
"properties" : {
"filed" : { "type" : "string" , "format" : "date" },
"effective" : { "type" : "string" , "format" : "date" },
"expiration" : { "type" : "string" , "format" : "date" }
}
},
"signatures" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"title" : { "type" : "string" },
"date" : { "type" : "string" , "format" : "date" }
}
}
}
},
"required" : [ "document_title" ]
},
"schema_prompt" : "Extract legal document details. Include all parties and their roles."
}
}
Medical Records
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"patient" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"dob" : { "type" : "string" , "format" : "date" , "description" : "Date of birth" },
"mrn" : { "type" : "string" , "description" : "Medical record number" }
},
"required" : [ "name" , "mrn" ]
},
"encounter" : {
"type" : "object" ,
"properties" : {
"date" : { "type" : "string" , "format" : "date" },
"provider" : { "type" : "string" },
"location" : { "type" : "string" }
}
},
"chief_complaint" : { "type" : "string" },
"medications" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"dosage" : { "type" : "string" },
"frequency" : { "type" : "string" }
}
}
},
"diagnoses" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"code" : { "type" : "string" , "description" : "ICD-10 code" },
"description" : { "type" : "string" }
}
}
},
"plan" : { "type" : "string" }
},
"required" : [ "patient" , "encounter" ]
},
"schema_prompt" : "Extract patient encounter details. Include all medications and diagnoses with their codes."
}
}
Advanced Techniques
Use schema_prompt to guide conditional extraction:
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"contract_type" : { "type" : "string" , "description" : "Type of contract: lease, purchase, service, etc." },
"terms" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"type" : { "type" : "string" },
"value" : { "type" : "string" }
}
}
}
},
"required" : [ "contract_type" ]
},
"schema_prompt" : "First identify the contract_type. If it's a 'lease', extract rental amount and duration as terms. If it's a 'purchase', extract price and closing date as terms."
}
}
Hierarchical Data
For documents with deeply nested structures:
{
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"organization" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"departments" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"head" : { "type" : "string" },
"employee_count" : { "type" : "number" }
}
}
}
}
}
}
},
"schema_prompt" : "Extract the organizational hierarchy including all departments."
}
}
Keep Schemas Focused
Extract only what you need. Avoid extracting entire documents as single fields.
Use Descriptions
Add description fields to guide the model on ambiguous fields or specific formats.
Leverage schema_prompt
Use schema_prompt to provide context that can’t be expressed in the schema structure alone.
Migration from Legacy Parameters
The schema and schema_prompt top-level parameters are deprecated and will be removed in a future version.
Before (Deprecated)
{
"file" : "@document.pdf" ,
"schema" : { "invoice_number" : "string" , "total" : "number" },
"schema_prompt" : "Extract invoice details"
}
After (Recommended)
{
"file" : "@document.pdf" ,
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : {
"type" : "string" ,
"description" : "The unique invoice identifier"
},
"total" : {
"type" : "number" ,
"description" : "Total invoice amount"
}
},
"required" : [ "invoice_number" , "total" ]
},
"schema_prompt" : "Extract invoice details"
}
}
The API currently supports both formats for backward compatibility, but new integrations should use structured_output.
Error Handling
Common Schema Errors
Error Cause Solution Invalid JSON Syntax error in schema Validate JSON syntax Unknown type Using unsupported data type Use string, number, boolean, or nested objects/arrays Too complex Deeply nested structure Simplify schema, flatten where possible No matches Fields don’t match document Adjust field names, use schema_prompt for guidance
Debugging Tips
Start with a minimal schema and add fields incrementally
Use schema_prompt to provide context and clarify ambiguous fields
Check extracted markdown without schema first to see available content
Verify field names match document terminology
Best Practices Summary
Use structured_output for all new integrations
Provide descriptive schema_prompt instructions
Use descriptive field names matching document terminology
Start simple and iterate
Test with real documents
Use appropriate data types (number for numeric values)
Use deprecated schema top-level parameter
Create overly complex nested structures
Use generic field names
Extract entire documents as single fields
Assume all fields will always exist
Next Steps