Split a document into topics

curl --request POST \
  --url https://api.runpulse.com/split \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "extraction_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "split_config_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "async": false
}
'

{
  "split_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "split_output": {
    "splits": {}
  },
  "credits_used": 123,
  "plan_info": {
    "tier": "<string>",
    "total_credits_used": 123,
    "pages_used": 1,
    "note": "<string>"
  }
}

Pipeline Steps

Split Document

Identify which pages of a document contain each topic/section. Takes an existing extraction and a list of topics, then uses AI to identify which PDF pages contain content related to each topic.

The result is persisted with a split_id that can be used with the /schema endpoint (split mode) for targeted schema extraction on specific page groups.

Set async: true to return immediately with a job_id for polling.

To split many extractions at once, see Batch Split or the Batch Processing guide.

POST

split

Split a document into topics

curl --request POST \
  --url https://api.runpulse.com/split \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "extraction_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "split_config_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "async": false
}
'

{
  "split_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "split_output": {
    "splits": {}
  },
  "credits_used": 123,
  "plan_info": {
    "tier": "<string>",
    "total_credits_used": 123,
    "pages_used": 1,
    "note": "<string>"
  }
}

Overview

Pipeline Step 2 (optional) — Split requires a prior extraction. After splitting, use schema extraction with split_id to apply per-topic schemas.

Identify which pages of a document contain each topic or section. The /split endpoint analyzes a saved extraction and uses AI to map pages to your defined topics. This is useful for:

Processing multi-section documents (e.g., annual reports, contracts)
Applying different schemas to different parts of a document
Organizing large documents by content type

This endpoint operates on saved extractions (created via /extract with storage enabled, which is the default).

To split many extractions at once, use Batch Split.

Async Mode

Set async: true to return immediately with a job ID for polling. See Polling for Results for details.

{
  "extraction_id": "abc123-def456",
  "split_config": { "split_input": [...] },
  "async": true
}

Request

Request Body

Field	Type	Required	Description
`extraction_id`	uuid	Yes	ID of the saved extraction to split
`split_config`	object	XOR	Inline split configuration with topics
`split_config_id`	uuid	XOR	Reference to a saved split configuration
`async`	boolean	No	If `true`, returns immediately with a `job_id` for polling. Default: `false`.

Inline Config (`split_config`)

Field	Type	Required	Description
`split_config.split_input`	array	Yes	List of topics to identify. Also accepts legacy name `topics` for backward compatibility.

Each topic in the split_input array:

Field	Type	Required	Description
`name`	string	Yes	Unique identifier for the topic
`description`	string	No	Description of what content belongs to this topic

Response

Synchronous Response (200)

Field	Type	Description
`split_id`	uuid	Unique identifier for this split result
`split_output`	object	Contains `splits` — a mapping of topic names to arrays of 1-indexed page numbers

Async Response (202)

Field	Type	Description
`job_id`	string	Job ID for polling
`status`	string	`"pending"`
`message`	string	Human-readable description

Example Usage

Split with Inline Config

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

split_result = client.split(
    extraction_id="abc123-def456-ghi789",
    split_config={
        "split_input": [
            {
                "name": "financial_statements",
                "description": "Balance sheets, income statements, cash flow statements"
            },
            {
                "name": "executive_summary",
                "description": "Letter to shareholders, company overview, highlights"
            },
            {
                "name": "risk_factors",
                "description": "Risk disclosures, forward-looking statements"
            }
        ]
    }
)

print(f"Split ID: {split_result.split_id}")
for topic, pages in split_result.split_output.splits.items():
    print(f"  {topic}: pages {pages}")

Split with Saved Config Reference

split_result = client.split(
    extraction_id="abc123-def456-ghi789",
    split_config_id="config-uuid-456"
)

Example Response

{
  "split_id": "split-uuid-123",
  "split_output": {
    "splits": {
      "financial_statements": [15, 16, 17, 18, 19, 20],
      "executive_summary": [1, 2, 3, 4],
      "risk_factors": [25, 26, 27, 28, 29, 30]
    }
  }
}

Using Split Results

After splitting, use the split_id with the /schema endpoint (split mode) to apply per-topic schemas:

split_id = split_result.split_id

schema_result = client.schema(
    split_id=split_id,
    split_schema_config={
        "financial_statements": {
            "schema": {"type": "object", "properties": {"revenue": {"type": "number"}}},
            "schema_prompt": "Extract financial data"
        },
        "risk_factors": {
            "schema": {"type": "object", "properties": {"risk_description": {"type": "string"}}}
        }
    }
)

Error Responses

Status	Error	Description
400	Invalid request	Missing required fields or invalid topic format
401	Unauthorized	Invalid or missing API key
404	Extraction not found	The `extraction_id` doesn’t exist or you don’t have access
429	Rate limit exceeded	Too many requests
500	Processing error	Split processing failed

Best Practices

Use descriptive topic names

Topic names become keys in the response and are used with /schema (split mode). Use clear, descriptive names like financial_statements rather than section_1.

Provide detailed descriptions

The description field helps the AI accurately identify relevant pages. Be specific about what content belongs to each topic.

Use async for large documents

For documents with many pages, set async: true to avoid request timeouts. See Polling for Results.

Authorizations

x-api-key

string

header

required

Body

application/json

Request body for splitting a document into topics. Provide EITHER split_config (inline) OR split_config_id (reference).

extraction_id

string<uuid>

required

ID of the saved extraction to split.

split_config

object

Inline split configuration with topics. Required if split_config_id is not provided.

Show child attributes

split_config_id

string<uuid>

Reference to a saved split configuration. Use this instead of providing split_config inline.

async

boolean

default:false

If true, returns immediately with a job_id for polling via GET /job/{jobId}. Otherwise processes synchronously.

Response

Split result with page assignments (when async=false or omitted)

Result of document splitting with page assignments.

split_id

string<uuid>

required

Unique identifier for this split result. Use this ID with the /schema endpoint (split mode) to apply schemas to specific page groups.

split_output

object

required

Page assignments per topic.

Show child attributes

credits_used

number<float> | null

Number of credits consumed by this request. Only present when the organization has the credit billing system enabled.

plan_info

object

Billing tier and cumulative usage information for the calling org, including this split run.

Show child attributes

Tables Batch Processing

⌘I

​Overview

​Async Mode

​Request

​Request Body

​Inline Config (split_config)

​Response

​Synchronous Response (200)

​Async Response (202)

​Example Usage

​Split with Inline Config

​Split with Saved Config Reference

​Example Response

​Using Split Results

​Error Responses

​Best Practices

Authorizations

Body

Response

Overview

Async Mode

Request

Request Body

Inline Config (`split_config`)

Response

Synchronous Response (200)

Async Response (202)

Example Usage

Split with Inline Config

Split with Saved Config Reference

Example Response

Using Split Results

Error Responses

Best Practices