> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured Output Guidelines

> Best practices for defining extraction schemas

<Warning>
  **Deprecation Notice**: The `structured_output` parameter on `/extract` is **deprecated**. Use the [`/schema`](/api-reference/endpoint/schema) endpoint after extraction instead. The schema format and design principles on this page still apply — just pass your schema to `/schema` via `schema_config` instead of to `/extract` via `structured_output`.
</Warning>

## Overview

This guide covers best practices for designing JSON schemas used with the [`/schema` endpoint](/api-reference/endpoint/schema) (recommended) or the legacy `structured_output` parameter on `/extract`.

## Schema Format

The `structured_output.schema` field uses the [JSON Schema](https://json-schema.org/) specification (OpenAPI 3.1 compatible). This is the same schema format used by OpenAI's structured outputs and other LLM providers.

### Key JSON Schema Properties

| Property      | Description                                    | Example                                                    |
| ------------- | ---------------------------------------------- | ---------------------------------------------------------- |
| `type`        | Data type of the field                         | `"string"`, `"number"`, `"boolean"`, `"object"`, `"array"` |
| `properties`  | Define fields for an object                    | `{"name": {"type": "string"}}`                             |
| `items`       | Define schema for array elements               | `{"type": "object", "properties": {...}}`                  |
| `required`    | List of required field names                   | `["name", "email"]`                                        |
| `description` | Human-readable description to guide extraction | `"Customer's full name"`                                   |
| `format`      | Hint for string formatting                     | `"date"`, `"email"`, `"uri"`                               |

## Schema Editor (Recommended)

<Tip>
  **Don't write schemas by hand!** Use the Schema Editor in the [Pulse Platform](https://platform.runpulse.com) to generate and refine schemas interactively.
</Tip>

The Schema Editor provides two powerful ways to create schemas:

### 1. Generate from Prompt

Describe what you want to extract in natural language, and the editor will generate a properly formatted JSON Schema for you.

> "Extract the account holder name, account number, statement period, opening and closing balances, and all transactions with date, description, and amount."

### 2. Interactive Editor

* Visually add, remove, and reorder fields
* Set field types and descriptions
* Mark fields as required
* Preview the generated schema in real-time
* Test against sample documents

Once you're happy with your schema, copy it directly into your API requests.

## Using the /schema Endpoint (Recommended)

The recommended approach is a two-step flow:

1. **Extract** the document via `/extract` to get an `extraction_id`
2. **Apply a schema** via `/schema` using the `extraction_id`

The `schema_config` object contains:

| Field           | Type    | Description                                           |
| --------------- | ------- | ----------------------------------------------------- |
| `input_schema`  | object  | JSON schema defining the structure of data to extract |
| `schema_prompt` | string  | Natural language instructions to guide extraction     |
| `effort`        | boolean | Enable extended reasoning for complex documents       |

### Bank Statement Example

Here's an example extracting key fields from a bank statement:

**Step 1: Extract**

```bash theme={null}
POST /extract
{"file_url": "https://...bank_statement.pdf"}
# → {"extraction_id": "abc123-...", "markdown": "...", ...}
```

**Step 2: Apply Schema**

```json theme={null}
{
  "extraction_id": "abc123-...",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "account_holder": {
          "type": "string",
          "description": "Name of the account holder"
        },
        "account_number": {
          "type": "string",
          "description": "Bank account number"
        },
        "opening_balance": {
          "type": "number",
          "description": "Balance at the start of the statement period"
        },
        "closing_balance": {
          "type": "number",
          "description": "Balance at the end of the statement period"
        }
      },
      "required": ["account_holder", "account_number", "opening_balance", "closing_balance"]
    }
  }
}
```

**Response (`schema_output`):**

```json theme={null}
{
  "schema_id": "schema-uuid-456",
  "version": 1,
  "schema_output": {
    "values": {
      "account_holder": "JAMES C. MORRISON",
      "account_number": "12345678",
      "opening_balance": 69.96,
      "closing_balance": 586.71
    },
    "citations": {
      "account_holder": {"page": 1, "bbox": [100, 50, 300, 70]}
    }
  }
}
```

### SDK Examples

<CodeGroup>
  ```python Python theme={null}
  from pulse import Pulse

  client = Pulse(api_key="YOUR_API_KEY")

  # Step 1: Extract
  response = client.extract(
      file_url="https://www.impact-bank.com/user/file/dummy_statement.pdf"
  )

  # Step 2: Apply schema
  schema = {
      "type": "object",
      "properties": {
          "account_holder": {
              "type": "string",
              "description": "Name of the account holder"
          },
          "account_number": {
              "type": "string",
              "description": "Bank account number"
          },
          "opening_balance": {"type": "number"},
          "closing_balance": {"type": "number"}
      },
      "required": ["account_holder", "account_number"]
  }

  schema_result = client.schema(
      extraction_id=response.extraction_id,
      schema_config={
          "input_schema": schema,
          "schema_prompt": "Extract bank statement details"
      }
  )

  print(f"Account Holder: {schema_result.schema_output['values']['account_holder']}")
  print(f"Balance: {schema_result.schema_output['values']['closing_balance']}")
  ```

  ```typescript TypeScript theme={null}
  import { PulseClient } from 'pulse-ts-sdk';

  const client = new PulseClient({ 
      apiKey: 'YOUR_API_KEY'
  });

  // Step 1: Extract
  const response = await client.extract({
      fileUrl: "https://www.impact-bank.com/user/file/dummy_statement.pdf"
  });

  // Step 2: Apply schema
  const schema = {
      type: "object",
      properties: {
          account_holder: {
              type: "string",
              description: "Name of the account holder"
          },
          account_number: {
              type: "string",
              description: "Bank account number"
          },
          opening_balance: { type: "number" },
          closing_balance: { type: "number" }
      },
      required: ["account_holder", "account_number"]
  };

  const schemaResult = await client.schema({
      extraction_id: response.extraction_id,
      schema_config: {
          input_schema: schema,
          schema_prompt: "Extract bank statement details"
      }
  });

  console.log(`Account Holder: ${schemaResult.schema_output?.values?.account_holder}`);
  console.log(`Balance: ${schemaResult.schema_output?.values?.closing_balance}`);
  ```

  ```bash curl theme={null}
  # Step 1: Extract
  curl -X POST https://api.runpulse.com/extract \
    -H "x-api-key: YOUR_API_KEY" \
    -F "file=@bank_statement.pdf"

  # Response: {"extraction_id": "abc123-...", "markdown": "...", ...}

  # Step 2: Apply schema
  curl -X POST https://api.runpulse.com/schema \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "extraction_id": "abc123-...",
      "schema_config": {
        "input_schema": {"type": "object", "properties": {"account_holder": {"type": "string"}, "total": {"type": "number"}}, "required": ["account_holder"]},
        "schema_prompt": "Extract bank statement details"
      }
    }'
  ```
</CodeGroup>

## Schema Format

Schemas follow the [JSON Schema](https://json-schema.org/) specification. Each field is defined with:

| Property      | Description                                                 |
| ------------- | ----------------------------------------------------------- |
| `type`        | Data type: `string`, `number`, `boolean`, `object`, `array` |
| `description` | Human-readable description to guide extraction              |
| `format`      | Optional format hint (e.g., `date`, `email`, `uri`)         |
| `required`    | Array of required field names (for objects)                 |
| `items`       | Schema for array elements                                   |
| `properties`  | Nested field definitions (for objects)                      |

### Data Types

| Type      | Description                                | Example Value                       |
| --------- | ------------------------------------------ | ----------------------------------- |
| `string`  | Text values                                | `"John Doe"`                        |
| `number`  | Numeric values (integer or decimal)        | `99.99`                             |
| `boolean` | True/false values                          | `true`                              |
| `object`  | Nested structures with `properties`        | `{"name": {"type": "string"}}`      |
| `array`   | Lists with `items` defining element schema | `{"type": "array", "items": {...}}` |

## Schema Design Principles

### 1. Start Simple

Begin with basic fields and gradually add complexity:

```json theme={null}
{
  "extraction_id": "abc123-...",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string"},
        "total": {"type": "number"}
      },
      "required": ["invoice_number", "total"]
    }
  }
}
```

Then expand with nested objects and arrays:

```json theme={null}
{
  "extraction_id": "abc123-...",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string", "description": "Invoice ID"},
        "date": {"type": "string", "format": "date"},
        "vendor": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "address": {"type": "string"}
          }
        },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": {"type": "string"},
              "amount": {"type": "number"}
            }
          }
        },
        "total": {"type": "number"}
      },
      "required": ["invoice_number", "total"]
    },
    "schema_prompt": "Extract all invoice details including vendor information and itemized charges."
  }
}
```

### 2. Use Descriptions

Add `description` fields to guide extraction:

```json theme={null}
{
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "The unique invoice identifier, usually at the top of the document"
    },
    "bill_to": {
      "type": "string", 
      "description": "Customer billing address"
    },
    "remit_to": {
      "type": "string",
      "description": "Payment remittance address"
    }
  }
}
```

### 3. Use schema\_prompt for Context

The `schema_prompt` field provides natural language guidance to help the model understand nuances:

```json theme={null}
{
  "extraction_id": "abc123-...",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "contract_type": {"type": "string"},
        "effective_date": {"type": "string", "format": "date"},
        "parties": {"type": "array", "items": {"type": "string"}},
        "key_terms": {"type": "array", "items": {"type": "string"}}
      }
    },
    "schema_prompt": "Extract contract details. For key_terms, focus on payment terms, termination clauses, and liability limitations. Format dates as YYYY-MM-DD."
  }
}
```

## Common Schema Patterns

### Invoice / Financial Documents

```json theme={null}
{
  "extraction_id": "your-extraction-id",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "document_type": {"type": "string"},
        "document_number": {"type": "string"},
        "date": {"type": "string", "format": "date"},
        "due_date": {"type": "string", "format": "date"},
        "vendor": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "address": {"type": "string"},
            "tax_id": {"type": "string"}
          }
        },
        "customer": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "account_number": {"type": "string"}
          }
        },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": {"type": "string"},
              "quantity": {"type": "number"},
              "unit_price": {"type": "number"},
              "amount": {"type": "number"}
            },
            "required": ["description", "amount"]
          }
        },
        "subtotal": {"type": "number"},
        "tax": {"type": "number"},
        "total": {"type": "number"}
      },
      "required": ["document_number", "total"]
    },
    "schema_prompt": "Extract all invoice details. Include all line items. Format currency as numbers without symbols."
  }
}
```

### Legal Documents

```json theme={null}
{
  "extraction_id": "your-extraction-id",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "document_title": {"type": "string"},
        "case_number": {"type": "string"},
        "parties": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "role": {"type": "string"},
              "representation": {"type": "string"}
            },
            "required": ["name", "role"]
          }
        },
        "dates": {
          "type": "object",
          "properties": {
            "filed": {"type": "string", "format": "date"},
            "effective": {"type": "string", "format": "date"},
            "expiration": {"type": "string", "format": "date"}
          }
        },
        "signatures": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "title": {"type": "string"},
              "date": {"type": "string", "format": "date"}
            }
          }
        }
      },
      "required": ["document_title"]
    },
    "schema_prompt": "Extract legal document details. Include all parties and their roles."
  }
}
```

### Medical Records

```json theme={null}
{
  "extraction_id": "your-extraction-id",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "patient": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "dob": {"type": "string", "format": "date", "description": "Date of birth"},
            "mrn": {"type": "string", "description": "Medical record number"}
          },
          "required": ["name", "mrn"]
        },
        "encounter": {
          "type": "object",
          "properties": {
            "date": {"type": "string", "format": "date"},
            "provider": {"type": "string"},
            "location": {"type": "string"}
          }
        },
        "chief_complaint": {"type": "string"},
        "medications": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "dosage": {"type": "string"},
              "frequency": {"type": "string"}
            }
          }
        },
        "diagnoses": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "code": {"type": "string", "description": "ICD-10 code"},
              "description": {"type": "string"}
            }
          }
        },
        "plan": {"type": "string"}
      },
      "required": ["patient", "encounter"]
    },
    "schema_prompt": "Extract patient encounter details. Include all medications and diagnoses with their codes."
  }
}
```

## Advanced Techniques

### Conditional Extraction

Use `schema_prompt` to guide conditional extraction:

```json theme={null}
{
  "extraction_id": "your-extraction-id",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "contract_type": {"type": "string", "description": "Type of contract: lease, purchase, service, etc."},
        "terms": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "type": {"type": "string"},
              "value": {"type": "string"}
            }
          }
        }
      },
      "required": ["contract_type"]
    },
    "schema_prompt": "First identify the contract_type. If it's a 'lease', extract rental amount and duration as terms. If it's a 'purchase', extract price and closing date as terms."
  }
}
```

### Hierarchical Data

For documents with deeply nested structures:

```json theme={null}
{
  "extraction_id": "your-extraction-id",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "organization": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "departments": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "head": {"type": "string"},
                  "employee_count": {"type": "number"}
                }
              }
            }
          }
        }
      }
    },
    "schema_prompt": "Extract the organizational hierarchy including all departments."
  }
}
```

## Performance Tips

### Keep Schemas Focused

Extract only what you need. Avoid extracting entire documents as single fields.

### Use Descriptions

Add `description` fields to guide the model on ambiguous fields or specific formats.

### Leverage schema\_prompt

Use `schema_prompt` to provide context that can't be expressed in the schema structure alone.

## Migration from Legacy Parameters

<Warning>
  Both the `schema` / `schema_prompt` top-level parameters **and** the `structured_output` parameter on `/extract` are deprecated. Use the [`/schema`](/api-reference/endpoint/schema) endpoint after extraction instead.
</Warning>

### Before (Deprecated — top-level schema on /extract)

```json theme={null}
{
  "file": "@document.pdf",
  "schema": {"invoice_number": "string", "total": "number"},
  "schema_prompt": "Extract invoice details"
}
```

### Before (Deprecated — structured\_output on /extract)

```json theme={null}
{
  "file": "@document.pdf",
  "structured_output": {
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": { "type": "string" },
        "total": { "type": "number" }
      },
      "required": ["invoice_number", "total"]
    },
    "schema_prompt": "Extract invoice details"
  }
}
```

### After (Recommended — /extract → /schema)

```bash theme={null}
# Step 1: Extract
POST /extract  {"file_url": "https://..."}
# → {"extraction_id": "abc123-...", "markdown": "...", ...}

# Step 2: Apply schema
POST /schema
```

```json theme={null}
{
  "extraction_id": "abc123-...",
  "schema_config": {
    "input_schema": {
      "type": "object",
      "properties": {
        "invoice_number": {
          "type": "string",
          "description": "The unique invoice identifier"
        },
        "total": {
          "type": "number",
          "description": "Total invoice amount"
        }
      },
      "required": ["invoice_number", "total"]
    },
    "schema_prompt": "Extract invoice details"
  }
}
```

The API supports `structured_output` on `/extract` for backward compatibility, but **all new integrations should use the `/schema` endpoint**.

## Error Handling

### Common Schema Errors

| Error        | Cause                       | Solution                                                    |
| ------------ | --------------------------- | ----------------------------------------------------------- |
| Invalid JSON | Syntax error in schema      | Validate JSON syntax                                        |
| Unknown type | Using unsupported data type | Use `string`, `number`, `boolean`, or nested objects/arrays |
| Too complex  | Deeply nested structure     | Simplify schema, flatten where possible                     |
| No matches   | Fields don't match document | Adjust field names, use schema\_prompt for guidance         |

### Debugging Tips

1. **Start with a minimal schema** and add fields incrementally
2. **Use schema\_prompt** to provide context and clarify ambiguous fields
3. **Check extracted markdown** without schema first to see available content
4. **Verify field names** match document terminology

## Best Practices Summary

<AccordionGroup>
  <Accordion title="DO">
    * Use the [`/schema`](/api-reference/endpoint/schema) endpoint for all new integrations
    * Provide descriptive `schema_prompt` instructions
    * Use descriptive field names matching document terminology
    * Start simple and iterate
    * Test with real documents
    * Use appropriate data types (`number` for numeric values)
  </Accordion>

  <Accordion title="DON'T">
    * Use `structured_output` on `/extract` (deprecated — use `/schema` instead)
    * Use deprecated `schema` top-level parameter
    * Create overly complex nested structures
    * Use generic field names
    * Extract entire documents as single fields
    * Assume all fields will always exist
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="Quickstart Guide" icon="rocket" href="/quickstart">
    See more examples
  </Card>

  <Card title="Schema Endpoint" icon="book" href="/api-reference/endpoint/schema">
    Apply schemas to extracted documents
  </Card>
</CardGroup>
