Structured Output Guidelines

Overview

The structured_output parameter allows you to extract structured data from documents in a consistent format. This guide provides comprehensive guidelines for creating effective schemas.

Migration Notice: The structured_output parameter replaces the deprecated schema and schema_prompt top-level parameters. See Migration from Legacy Parameters for details.

Schema Format

The structured_output.schema field uses the JSON Schema specification (OpenAPI 3.1 compatible). This is the same schema format used by OpenAI’s structured outputs and other LLM providers.

Key JSON Schema Properties

Property	Description	Example
`type`	Data type of the field	`"string"`, `"number"`, `"boolean"`, `"object"`, `"array"`
`properties`	Define fields for an object	`{"name": {"type": "string"}}`
`items`	Define schema for array elements	`{"type": "object", "properties": {...}}`
`required`	List of required field names	`["name", "email"]`
`description`	Human-readable description to guide extraction	`"Customer's full name"`
`format`	Hint for string formatting	`"date"`, `"email"`, `"uri"`

Schema Editor (Recommended)

Don’t write schemas by hand! Use the Schema Editor in the Pulse Platform to generate and refine schemas interactively.

The Schema Editor provides two powerful ways to create schemas:

1. Generate from Prompt

Describe what you want to extract in natural language, and the editor will generate a properly formatted JSON Schema for you.

“Extract the account holder name, account number, statement period, opening and closing balances, and all transactions with date, description, and amount.”

2. Interactive Editor

Visually add, remove, and reorder fields
Set field types and descriptions
Mark fields as required
Preview the generated schema in real-time
Test against sample documents

Once you’re happy with your schema, copy it directly into your API requests.

Using structured_output

The structured_output parameter is an object containing:

Field	Type	Description
`schema`	object	JSON schema defining the structure of data to extract
`schema_prompt`	string	Natural language instructions to guide extraction

Bank Statement Example

Here’s an example extracting key fields from a bank statement: Request Schema:

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "account_holder": {
          "type": "string",
          "description": "Name of the account holder"
        },
        "account_number": {
          "type": "string",
          "description": "Bank account number"
        },
        "opening_balance": {
          "type": "number",
          "description": "Balance at the start of the statement period"
        },
        "closing_balance": {
          "type": "number",
          "description": "Balance at the end of the statement period"
        }
      },
      "required": ["account_holder", "account_number", "opening_balance", "closing_balance"]
    }
  }
}

Response (structured_output field): When you include a structured_output schema in your request, the API response will include a structured_output field containing the extracted data matching your schema:

{
  "markdown": "SAMPLE\nStatement of Account\n12345678\n\nJAMES C. MORRISON\n...",
  "page_count": 2,
  "structured_output": {
    "account_holder": "JAMES C. MORRISON",
    "account_number": "12345678",
    "opening_balance": 69.96,
    "closing_balance": 586.71
  },
  "bounding_boxes": { ... },
  "plan-info": { ... }
}

The structured_output object will always match the structure defined in your structured_output.schema request parameter.

SDK Examples

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

schema = {
    "type": "object",
    "properties": {
        "account_holder": {
            "type": "string",
            "description": "Name of the account holder"
        },
        "account_number": {
            "type": "string",
            "description": "Bank account number"
        },
        "opening_balance": {"type": "number"},
        "closing_balance": {"type": "number"}
    },
    "required": ["account_holder", "account_number"]
}

response = client.extract(
    file_url="https://www.impact-bank.com/user/file/dummy_statement.pdf",
    structured_output={
        "schema": schema,
        "schema_prompt": "Extract bank statement details"
    }
)

print(f"Account Holder: {response.structured_output['account_holder']}")
print(f"Balance: {response.structured_output['closing_balance']}")

Schema Format

Schemas follow the JSON Schema specification. Each field is defined with:

Property	Description
`type`	Data type: `string`, `number`, `boolean`, `object`, `array`
`description`	Human-readable description to guide extraction
`format`	Optional format hint (e.g., `date`, `email`, `uri`)
`required`	Array of required field names (for objects)
`items`	Schema for array elements
`properties`	Nested field definitions (for objects)

Data Types

Type	Description	Example Value
`string`	Text values	`"John Doe"`
`number`	Numeric values (integer or decimal)	`99.99`
`boolean`	True/false values	`true`
`object`	Nested structures with `properties`	`{"name": {"type": "string"}}`
`array`	Lists with `items` defining element schema	`{"type": "array", "items": {...}}`

Schema Design Principles

1. Start Simple

Begin with basic fields and gradually add complexity:

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string"},
        "total": {"type": "number"}
      },
      "required": ["invoice_number", "total"]
    }
  }
}

Then expand with nested objects and arrays:

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string", "description": "Invoice ID"},
        "date": {"type": "string", "format": "date"},
        "vendor": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "address": {"type": "string"}
          }
        },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": {"type": "string"},
              "amount": {"type": "number"}
            }
          }
        },
        "total": {"type": "number"}
      },
      "required": ["invoice_number", "total"]
    },
    "schema_prompt": "Extract all invoice details including vendor information and itemized charges."
  }
}

2. Use Descriptions

Add description fields to guide extraction:

{
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "The unique invoice identifier, usually at the top of the document"
    },
    "bill_to": {
      "type": "string", 
      "description": "Customer billing address"
    },
    "remit_to": {
      "type": "string",
      "description": "Payment remittance address"
    }
  }
}

3. Use schema_prompt for Context

The schema_prompt field provides natural language guidance to help the model understand nuances:

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "contract_type": {"type": "string"},
        "effective_date": {"type": "string", "format": "date"},
        "parties": {"type": "array", "items": {"type": "string"}},
        "key_terms": {"type": "array", "items": {"type": "string"}}
      }
    },
    "schema_prompt": "Extract contract details. For key_terms, focus on payment terms, termination clauses, and liability limitations. Format dates as YYYY-MM-DD."
  }
}

Common Schema Patterns

Invoice / Financial Documents

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "document_type": {"type": "string"},
        "document_number": {"type": "string"},
        "date": {"type": "string", "format": "date"},
        "due_date": {"type": "string", "format": "date"},
        "vendor": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "address": {"type": "string"},
            "tax_id": {"type": "string"}
          }
        },
        "customer": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "account_number": {"type": "string"}
          }
        },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": {"type": "string"},
              "quantity": {"type": "number"},
              "unit_price": {"type": "number"},
              "amount": {"type": "number"}
            },
            "required": ["description", "amount"]
          }
        },
        "subtotal": {"type": "number"},
        "tax": {"type": "number"},
        "total": {"type": "number"}
      },
      "required": ["document_number", "total"]
    },
    "schema_prompt": "Extract all invoice details. Include all line items. Format currency as numbers without symbols."
  }
}

Legal Documents

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "document_title": {"type": "string"},
        "case_number": {"type": "string"},
        "parties": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "role": {"type": "string"},
              "representation": {"type": "string"}
            },
            "required": ["name", "role"]
          }
        },
        "dates": {
          "type": "object",
          "properties": {
            "filed": {"type": "string", "format": "date"},
            "effective": {"type": "string", "format": "date"},
            "expiration": {"type": "string", "format": "date"}
          }
        },
        "signatures": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "title": {"type": "string"},
              "date": {"type": "string", "format": "date"}
            }
          }
        }
      },
      "required": ["document_title"]
    },
    "schema_prompt": "Extract legal document details. Include all parties and their roles."
  }
}

Medical Records

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "patient": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "dob": {"type": "string", "format": "date", "description": "Date of birth"},
            "mrn": {"type": "string", "description": "Medical record number"}
          },
          "required": ["name", "mrn"]
        },
        "encounter": {
          "type": "object",
          "properties": {
            "date": {"type": "string", "format": "date"},
            "provider": {"type": "string"},
            "location": {"type": "string"}
          }
        },
        "chief_complaint": {"type": "string"},
        "medications": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "dosage": {"type": "string"},
              "frequency": {"type": "string"}
            }
          }
        },
        "diagnoses": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "code": {"type": "string", "description": "ICD-10 code"},
              "description": {"type": "string"}
            }
          }
        },
        "plan": {"type": "string"}
      },
      "required": ["patient", "encounter"]
    },
    "schema_prompt": "Extract patient encounter details. Include all medications and diagnoses with their codes."
  }
}

Advanced Techniques

Conditional Extraction

Use schema_prompt to guide conditional extraction:

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "contract_type": {"type": "string", "description": "Type of contract: lease, purchase, service, etc."},
        "terms": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "type": {"type": "string"},
              "value": {"type": "string"}
            }
          }
        }
      },
      "required": ["contract_type"]
    },
    "schema_prompt": "First identify the contract_type. If it's a 'lease', extract rental amount and duration as terms. If it's a 'purchase', extract price and closing date as terms."
  }
}

Hierarchical Data

For documents with deeply nested structures:

{
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "organization": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "departments": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "head": {"type": "string"},
                  "employee_count": {"type": "number"}
                }
              }
            }
          }
        }
      }
    },
    "schema_prompt": "Extract the organizational hierarchy including all departments."
  }
}

Performance Tips

Keep Schemas Focused

Extract only what you need. Avoid extracting entire documents as single fields.

Use Descriptions

Add description fields to guide the model on ambiguous fields or specific formats.

Leverage schema_prompt

Use schema_prompt to provide context that can’t be expressed in the schema structure alone.

Migration from Legacy Parameters

The schema and schema_prompt top-level parameters are deprecated and will be removed in a future version.

Before (Deprecated)

{
  "file": "@document.pdf",
  "schema": {"invoice_number": "string", "total": "number"},
  "schema_prompt": "Extract invoice details"
}

After (Recommended)

{
  "file": "@document.pdf",
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": {
          "type": "string",
          "description": "The unique invoice identifier"
        },
        "total": {
          "type": "number",
          "description": "Total invoice amount"
        }
      },
      "required": ["invoice_number", "total"]
    },
    "schema_prompt": "Extract invoice details"
  }
}

The API currently supports both formats for backward compatibility, but new integrations should use structured_output.

Error Handling

Common Schema Errors

Error	Cause	Solution
Invalid JSON	Syntax error in schema	Validate JSON syntax
Unknown type	Using unsupported data type	Use `string`, `number`, `boolean`, or nested objects/arrays
Too complex	Deeply nested structure	Simplify schema, flatten where possible
No matches	Fields don’t match document	Adjust field names, use schema_prompt for guidance

Debugging Tips

Start with a minimal schema and add fields incrementally
Use schema_prompt to provide context and clarify ambiguous fields
Check extracted markdown without schema first to see available content
Verify field names match document terminology

Best Practices Summary

Use structured_output for all new integrations
Provide descriptive schema_prompt instructions
Use descriptive field names matching document terminology
Start simple and iterate
Test with real documents
Use appropriate data types (number for numeric values)

DON'T

Use deprecated schema top-level parameter
Create overly complex nested structures
Use generic field names
Extract entire documents as single fields
Assume all fields will always exist

API Reference

Endpoints

Structured Output Guidelines

Overview

Schema Format

Key JSON Schema Properties

Schema Editor (Recommended)

1. Generate from Prompt

2. Interactive Editor

Using structured_output

Bank Statement Example

SDK Examples

Schema Format

Data Types

Schema Design Principles

1. Start Simple

2. Use Descriptions

3. Use schema_prompt for Context

Common Schema Patterns

Invoice / Financial Documents

Legal Documents

Medical Records

Advanced Techniques

Conditional Extraction

Hierarchical Data

Performance Tips

Keep Schemas Focused

Use Descriptions

Leverage schema_prompt

Migration from Legacy Parameters

Before (Deprecated)

After (Recommended)

Error Handling

Common Schema Errors

Debugging Tips

Best Practices Summary

Next Steps

Quickstart Guide

Extract Endpoint

API Reference

Endpoints

​Overview

​Schema Format

​Key JSON Schema Properties

​Schema Editor (Recommended)

​1. Generate from Prompt

​2. Interactive Editor

​Using structured_output

​Bank Statement Example

​SDK Examples

​Schema Format

​Data Types

​Schema Design Principles

​1. Start Simple

​2. Use Descriptions

​3. Use schema_prompt for Context

​Common Schema Patterns

​Invoice / Financial Documents

​Legal Documents

​Medical Records

​Advanced Techniques

​Conditional Extraction

​Hierarchical Data

​Performance Tips

​Keep Schemas Focused

​Use Descriptions

​Leverage schema_prompt

​Migration from Legacy Parameters

​Before (Deprecated)

​After (Recommended)

​Error Handling

​Common Schema Errors

​Debugging Tips

​Best Practices Summary

​Next Steps

Quickstart Guide

Extract Endpoint

Overview

Schema Format

Key JSON Schema Properties

Schema Editor (Recommended)

1. Generate from Prompt

2. Interactive Editor

Using structured_output

Bank Statement Example

SDK Examples

Schema Format

Data Types

Schema Design Principles

1. Start Simple

2. Use Descriptions

3. Use schema_prompt for Context

Common Schema Patterns

Invoice / Financial Documents

Legal Documents

Medical Records

Advanced Techniques

Conditional Extraction

Hierarchical Data

Performance Tips

Keep Schemas Focused

Use Descriptions

Leverage schema_prompt

Migration from Legacy Parameters

Before (Deprecated)

After (Recommended)

Error Handling

Common Schema Errors

Debugging Tips

Best Practices Summary

Next Steps