Documentation Index Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
Use this file to discover all available pages before exploring further.
Deprecation Notice : The structured_output parameter on /extract is deprecated . Use the /schema endpoint after extraction instead. The schema format and design principles on this page still apply — just pass your schema to /schema via schema_config instead of to /extract via structured_output.
Overview
This guide covers best practices for designing JSON schemas used with the /schema endpoint (recommended) or the legacy structured_output parameter on /extract.
The structured_output.schema field uses the JSON Schema specification (OpenAPI 3.1 compatible). This is the same schema format used by OpenAI’s structured outputs and other LLM providers.
Key JSON Schema Properties
Property Description Example typeData type of the field "string", "number", "boolean", "object", "array"propertiesDefine fields for an object {"name": {"type": "string"}}itemsDefine schema for array elements {"type": "object", "properties": {...}}requiredList of required field names ["name", "email"]descriptionHuman-readable description to guide extraction "Customer's full name"formatHint for string formatting "date", "email", "uri"
Schema Editor (Recommended)
Don’t write schemas by hand! Use the Schema Editor in the Pulse Platform to generate and refine schemas interactively.
The Schema Editor provides two powerful ways to create schemas:
1. Generate from Prompt
Describe what you want to extract in natural language, and the editor will generate a properly formatted JSON Schema for you.
“Extract the account holder name, account number, statement period, opening and closing balances, and all transactions with date, description, and amount.”
2. Interactive Editor
Visually add, remove, and reorder fields
Set field types and descriptions
Mark fields as required
Preview the generated schema in real-time
Test against sample documents
Once you’re happy with your schema, copy it directly into your API requests.
Using the /schema Endpoint (Recommended)
The recommended approach is a two-step flow:
Extract the document via /extract to get an extraction_id
Apply a schema via /schema using the extraction_id
The schema_config object contains:
Field Type Description input_schemaobject JSON schema defining the structure of data to extract schema_promptstring Natural language instructions to guide extraction effortboolean Enable extended reasoning for complex documents
Bank Statement Example
Here’s an example extracting key fields from a bank statement:
Step 1: Extract
POST /extract
{ "file_url" : "https://...bank_statement.pdf"}
# → {"extraction_id": "abc123-...", "markdown": "...", ...}
Step 2: Apply Schema
{
"extraction_id" : "abc123-..." ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"account_holder" : {
"type" : "string" ,
"description" : "Name of the account holder"
},
"account_number" : {
"type" : "string" ,
"description" : "Bank account number"
},
"opening_balance" : {
"type" : "number" ,
"description" : "Balance at the start of the statement period"
},
"closing_balance" : {
"type" : "number" ,
"description" : "Balance at the end of the statement period"
}
},
"required" : [ "account_holder" , "account_number" , "opening_balance" , "closing_balance" ]
}
}
}
Response (schema_output):
{
"schema_id" : "schema-uuid-456" ,
"version" : 1 ,
"schema_output" : {
"values" : {
"account_holder" : "JAMES C. MORRISON" ,
"account_number" : "12345678" ,
"opening_balance" : 69.96 ,
"closing_balance" : 586.71
},
"citations" : {
"account_holder" : { "page" : 1 , "bbox" : [ 100 , 50 , 300 , 70 ]}
}
}
}
SDK Examples
from pulse import Pulse
client = Pulse( api_key = "YOUR_API_KEY" )
# Step 1: Extract
response = client.extract(
file_url = "https://www.impact-bank.com/user/file/dummy_statement.pdf"
)
# Step 2: Apply schema
schema = {
"type" : "object" ,
"properties" : {
"account_holder" : {
"type" : "string" ,
"description" : "Name of the account holder"
},
"account_number" : {
"type" : "string" ,
"description" : "Bank account number"
},
"opening_balance" : { "type" : "number" },
"closing_balance" : { "type" : "number" }
},
"required" : [ "account_holder" , "account_number" ]
}
schema_result = client.schema(
extraction_id = response.extraction_id,
schema_config = {
"input_schema" : schema,
"schema_prompt" : "Extract bank statement details"
}
)
print ( f "Account Holder: { schema_result.schema_output[ 'values' ][ 'account_holder' ] } " )
print ( f "Balance: { schema_result.schema_output[ 'values' ][ 'closing_balance' ] } " )
Schemas follow the JSON Schema specification. Each field is defined with:
Property Description typeData type: string, number, boolean, object, array descriptionHuman-readable description to guide extraction formatOptional format hint (e.g., date, email, uri) requiredArray of required field names (for objects) itemsSchema for array elements propertiesNested field definitions (for objects)
Data Types
Type Description Example Value stringText values "John Doe"numberNumeric values (integer or decimal) 99.99booleanTrue/false values trueobjectNested structures with properties {"name": {"type": "string"}}arrayLists with items defining element schema {"type": "array", "items": {...}}
Schema Design Principles
1. Start Simple
Begin with basic fields and gradually add complexity:
{
"extraction_id" : "abc123-..." ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" },
"total" : { "type" : "number" }
},
"required" : [ "invoice_number" , "total" ]
}
}
}
Then expand with nested objects and arrays:
{
"extraction_id" : "abc123-..." ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" , "description" : "Invoice ID" },
"date" : { "type" : "string" , "format" : "date" },
"vendor" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"address" : { "type" : "string" }
}
},
"line_items" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"description" : { "type" : "string" },
"amount" : { "type" : "number" }
}
}
},
"total" : { "type" : "number" }
},
"required" : [ "invoice_number" , "total" ]
},
"schema_prompt" : "Extract all invoice details including vendor information and itemized charges."
}
}
2. Use Descriptions
Add description fields to guide extraction:
{
"properties" : {
"invoice_number" : {
"type" : "string" ,
"description" : "The unique invoice identifier, usually at the top of the document"
},
"bill_to" : {
"type" : "string" ,
"description" : "Customer billing address"
},
"remit_to" : {
"type" : "string" ,
"description" : "Payment remittance address"
}
}
}
3. Use schema_prompt for Context
The schema_prompt field provides natural language guidance to help the model understand nuances:
{
"extraction_id" : "abc123-..." ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"contract_type" : { "type" : "string" },
"effective_date" : { "type" : "string" , "format" : "date" },
"parties" : { "type" : "array" , "items" : { "type" : "string" }},
"key_terms" : { "type" : "array" , "items" : { "type" : "string" }}
}
},
"schema_prompt" : "Extract contract details. For key_terms, focus on payment terms, termination clauses, and liability limitations. Format dates as YYYY-MM-DD."
}
}
Common Schema Patterns
Invoice / Financial Documents
{
"extraction_id" : "your-extraction-id" ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"document_type" : { "type" : "string" },
"document_number" : { "type" : "string" },
"date" : { "type" : "string" , "format" : "date" },
"due_date" : { "type" : "string" , "format" : "date" },
"vendor" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"address" : { "type" : "string" },
"tax_id" : { "type" : "string" }
}
},
"customer" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"account_number" : { "type" : "string" }
}
},
"line_items" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"description" : { "type" : "string" },
"quantity" : { "type" : "number" },
"unit_price" : { "type" : "number" },
"amount" : { "type" : "number" }
},
"required" : [ "description" , "amount" ]
}
},
"subtotal" : { "type" : "number" },
"tax" : { "type" : "number" },
"total" : { "type" : "number" }
},
"required" : [ "document_number" , "total" ]
},
"schema_prompt" : "Extract all invoice details. Include all line items. Format currency as numbers without symbols."
}
}
Legal Documents
{
"extraction_id" : "your-extraction-id" ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"document_title" : { "type" : "string" },
"case_number" : { "type" : "string" },
"parties" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"role" : { "type" : "string" },
"representation" : { "type" : "string" }
},
"required" : [ "name" , "role" ]
}
},
"dates" : {
"type" : "object" ,
"properties" : {
"filed" : { "type" : "string" , "format" : "date" },
"effective" : { "type" : "string" , "format" : "date" },
"expiration" : { "type" : "string" , "format" : "date" }
}
},
"signatures" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"title" : { "type" : "string" },
"date" : { "type" : "string" , "format" : "date" }
}
}
}
},
"required" : [ "document_title" ]
},
"schema_prompt" : "Extract legal document details. Include all parties and their roles."
}
}
Medical Records
{
"extraction_id" : "your-extraction-id" ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"patient" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"dob" : { "type" : "string" , "format" : "date" , "description" : "Date of birth" },
"mrn" : { "type" : "string" , "description" : "Medical record number" }
},
"required" : [ "name" , "mrn" ]
},
"encounter" : {
"type" : "object" ,
"properties" : {
"date" : { "type" : "string" , "format" : "date" },
"provider" : { "type" : "string" },
"location" : { "type" : "string" }
}
},
"chief_complaint" : { "type" : "string" },
"medications" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"dosage" : { "type" : "string" },
"frequency" : { "type" : "string" }
}
}
},
"diagnoses" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"code" : { "type" : "string" , "description" : "ICD-10 code" },
"description" : { "type" : "string" }
}
}
},
"plan" : { "type" : "string" }
},
"required" : [ "patient" , "encounter" ]
},
"schema_prompt" : "Extract patient encounter details. Include all medications and diagnoses with their codes."
}
}
Advanced Techniques
Use schema_prompt to guide conditional extraction:
{
"extraction_id" : "your-extraction-id" ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"contract_type" : { "type" : "string" , "description" : "Type of contract: lease, purchase, service, etc." },
"terms" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"type" : { "type" : "string" },
"value" : { "type" : "string" }
}
}
}
},
"required" : [ "contract_type" ]
},
"schema_prompt" : "First identify the contract_type. If it's a 'lease', extract rental amount and duration as terms. If it's a 'purchase', extract price and closing date as terms."
}
}
Hierarchical Data
For documents with deeply nested structures:
{
"extraction_id" : "your-extraction-id" ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"organization" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"departments" : {
"type" : "array" ,
"items" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"head" : { "type" : "string" },
"employee_count" : { "type" : "number" }
}
}
}
}
}
}
},
"schema_prompt" : "Extract the organizational hierarchy including all departments."
}
}
Keep Schemas Focused
Extract only what you need. Avoid extracting entire documents as single fields.
Use Descriptions
Add description fields to guide the model on ambiguous fields or specific formats.
Leverage schema_prompt
Use schema_prompt to provide context that can’t be expressed in the schema structure alone.
Migration from Legacy Parameters
Both the schema / schema_prompt top-level parameters and the structured_output parameter on /extract are deprecated. Use the /schema endpoint after extraction instead.
{
"file" : "@document.pdf" ,
"schema" : { "invoice_number" : "string" , "total" : "number" },
"schema_prompt" : "Extract invoice details"
}
{
"file" : "@document.pdf" ,
"structured_output" : {
"schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : { "type" : "string" },
"total" : { "type" : "number" }
},
"required" : [ "invoice_number" , "total" ]
},
"schema_prompt" : "Extract invoice details"
}
}
# Step 1: Extract
POST /extract {"file_url": "https://..."}
# → {"extraction_id": "abc123-...", "markdown": "...", ...}
# Step 2: Apply schema
POST /schema
{
"extraction_id" : "abc123-..." ,
"schema_config" : {
"input_schema" : {
"type" : "object" ,
"properties" : {
"invoice_number" : {
"type" : "string" ,
"description" : "The unique invoice identifier"
},
"total" : {
"type" : "number" ,
"description" : "Total invoice amount"
}
},
"required" : [ "invoice_number" , "total" ]
},
"schema_prompt" : "Extract invoice details"
}
}
The API supports structured_output on /extract for backward compatibility, but all new integrations should use the /schema endpoint .
Error Handling
Common Schema Errors
Error Cause Solution Invalid JSON Syntax error in schema Validate JSON syntax Unknown type Using unsupported data type Use string, number, boolean, or nested objects/arrays Too complex Deeply nested structure Simplify schema, flatten where possible No matches Fields don’t match document Adjust field names, use schema_prompt for guidance
Debugging Tips
Start with a minimal schema and add fields incrementally
Use schema_prompt to provide context and clarify ambiguous fields
Check extracted markdown without schema first to see available content
Verify field names match document terminology
Best Practices Summary
Use the /schema endpoint for all new integrations
Provide descriptive schema_prompt instructions
Use descriptive field names matching document terminology
Start simple and iterate
Test with real documents
Use appropriate data types (number for numeric values)
Use structured_output on /extract (deprecated — use /schema instead)
Use deprecated schema top-level parameter
Create overly complex nested structures
Use generic field names
Extract entire documents as single fields
Assume all fields will always exist
Next Steps
Quickstart Guide See more examples
Schema Endpoint Apply schemas to extracted documents