> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Schema, Tables, Or Split

> Know when to use each downstream step after extraction.

After Extract, most decisions come down to three features:

* **Schema** turns document content into named JSON fields.
* **Tables** reconstructs table structure as table output.
* **Split** assigns pages to semantic topics so later steps can run only where they belong.

## Comparison

| Feature    | Best for                                                         | Input                            | Output                                   |
| ---------- | ---------------------------------------------------------------- | -------------------------------- | ---------------------------------------- |
| **Schema** | Field extraction, normalized JSON, citations                     | `extraction_id` or `split_id`    | `schema_output.values` and citations     |
| **Tables** | Financial tables, schedules, cross-page tables, charts as tables | `extraction_id` or split context | HTML table objects with citations        |
| **Split**  | Topic routing and per-section workflows                          | `extraction_id`                  | Topic-to-page assignments and `split_id` |

## Use Schema When

You can name the fields you want before processing the document:

* Invoice number, vendor, total, and line items
* Parties, effective date, renewal terms, and termination clauses
* Policyholder, coverage limit, deductible, and claim number
* Bank account holder, balances, and transaction summary

Good schemas include field descriptions. The description tells Pulse what the field means, not just what it is called.

```json theme={null}
{
  "type": "object",
  "properties": {
    "effective_date": {
      "type": "string",
      "description": "The date the agreement becomes effective, not the signature date"
    }
  }
}
```

## Use Tables When

The row and column structure matters:

* 10-K and 10-Q financial statements
* Capitalization tables
* Loss runs and bordereaux
* Pricing schedules
* Tables that continue across pages

Enable table merge when a table continues over page breaks. Enable chart-to-table conversion when charts or figures contain numeric data you need downstream.

## Use Split When

The document has distinct sections that should not all receive the same instructions:

* Annual reports with business overview, risk factors, financials, and leadership sections
* Loan files with application, underwriting, appraisal, and closing documents
* Insurance packets with policy terms, schedules, claims, and endorsements
* Research or diligence packs with mixed document types

Split returns page groups. You can stop there for routing, or pass the `split_id` to Schema or Tables for topic-specific output.

## Patterns

### Extract -> Schema

Best for single-structure documents.

```mermaid theme={null}
flowchart LR
    A["/extract"] --> B["extraction_id"]
    B --> C["/schema"]
    C --> D["JSON fields + citations"]
```

### Extract -> Tables

Best for table-first output.

```mermaid theme={null}
flowchart LR
    A["/extract"] --> B["extraction_id"]
    B --> C["/tables"]
    C --> D["HTML tables + citations"]
```

### Extract -> Split -> Schema

Best for long mixed documents.

```mermaid theme={null}
flowchart LR
    A["/extract"] --> B["extraction_id"]
    B --> C["/split"]
    C --> D["split_id"]
    D --> E["/schema"]
    E --> F["Per-topic JSON"]
```

## Practical Rule

If you want **values**, use Schema. If you want **tables**, use Tables. If you need **different logic for different pages**, use Split first.

<CardGroup cols={3}>
  <Card title="Schema Guidelines" icon="brackets-curly" href="/api-reference/structured-output-guidelines">
    Write schemas that produce reliable JSON.
  </Card>

  <Card title="Tables API" icon="table" href="/api-reference/endpoint/tables">
    Extract and merge structured tables.
  </Card>

  <Card title="Split API" icon="scissors" href="/api-reference/endpoint/split">
    Assign pages to topics.
  </Card>
</CardGroup>
