> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Annual Report Split -> Schema

> Split a long annual report into topics and apply different schemas per topic.

## Goal

Extract different types of structured data from different sections of a long report without forcing one huge schema across the whole document.

## Sample Document

Use the built-in [10-K Annual Report Platform example](https://platform.runpulse.com/dashboard/examples/0514fc05-8b0a-4a3b-9b9b-18ac89fc04e5). The example includes a saved Split output with topics for exhibits and financial statement schedules, signatures, and certifications.

## Use This Workflow

```mermaid theme={null}
flowchart LR
    A[Annual report] --> B["/extract"]
    B --> C["/split"]
    C --> D["Financials"]
    C --> E["Leadership"]
    C --> F["Risk Factors"]
    D --> G["/schema"]
    E --> G
    F --> G
```

Use **Extract -> Split -> Schema** when sections have different vocabulary, layout, and output requirements.

## Split Topics

| Topic                         | Description                                                     |
| ----------------------------- | --------------------------------------------------------------- |
| Financial Statement Schedules | Financial statement schedules, exhibits, and supporting tables. |
| Signatures                    | Signature blocks, officers, titles, and signing dates.          |
| Certifications                | Officer certifications and compliance attestations.             |

## Platform Steps

<Steps>
  <Step title="Extract the report">
    Upload the report and run Extract. Use async for long PDFs.
  </Step>

  <Step title="Add Split">
    Add topics with clear names and descriptions. Run Split and inspect page assignments.
  </Step>

  <Step title="Add per-topic schemas">
    Define a different schema for each topic. Keep each schema narrow and specific.
  </Step>

  <Step title="Save presets">
    Save the split config and per-topic schema configs once the workflow works across multiple reports.
  </Step>
</Steps>

## Python

```python theme={null}
from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

extract_result = client.extract(
    file_url="https://platform.runpulse.com/api/examples/0514fc05-8b0a-4a3b-9b9b-18ac89fc04e5/pdf",
    async_=True,
)

# In production, poll extract_result.job_id when async_=True, then read the
# completed extraction_id from the job result.
extraction_id = extract_result.extraction_id

split_result = client.split(
    extraction_id=extraction_id,
    split_config={
        "split_input": [
            {"name": "Financial Statement Schedules", "description": "Financial statement schedules, exhibits, and supporting tables"},
            {"name": "Signatures", "description": "Signature blocks, officers, titles, and signing dates"},
            {"name": "Certifications", "description": "Officer certifications and compliance attestations"},
        ]
    },
)

schema_result = client.schema(
    split_id=split_result.split_id,
    split_schema_config={
        "Financial Statement Schedules": {
            "schema": {
                "type": "object",
                "properties": {
                    "schedule_names": {"type": "array", "items": {"type": "string"}},
                    "exhibit_numbers": {"type": "array", "items": {"type": "string"}},
                },
            },
            "schema_prompt": "Extract schedule names and exhibit identifiers.",
        },
        "Signatures": {
            "schema": {
                "type": "object",
                "properties": {
                    "signatories": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "title": {"type": "string"},
                                "date": {"type": "string"},
                            },
                        },
                    },
                },
            }
        },
        "Certifications": {
            "schema": {
                "type": "object",
                "properties": {
                    "certifying_officers": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "Officers who signed certifications",
                    }
                },
            }
        },
    },
)

print(schema_result.results)
```

## Checks

* Split topics should be mutually exclusive and easy to describe.
* Inspect page assignments before trusting schema output.
* Keep per-topic schemas smaller than a single all-purpose schema.
* If the same topic appears across many documents, save it as a split preset.
* For regulated review, store the split output and schema version IDs with the downstream record so reviewers can reproduce the exact extraction.

## Related

<CardGroup cols={3}>
  <Card title="Extract -> Split -> Schema" icon="diagram-project" href="/platform-reference/extract-split-schema">
    Full Platform walkthrough.
  </Card>

  <Card title="Chaining Steps" icon="link" href="/concepts/chaining">
    Understand the `extraction_id` to `split_id` handoff.
  </Card>

  <Card title="Sample Documents" icon="file-pdf" href="/cookbooks/sample-documents">
    Use a long sample PDF to test topic splits.
  </Card>
</CardGroup>
