Skip to main content
POST
/
split
Python SDK
from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

# Async split (recommended)
response = client.split(
    extraction_id="your-extraction-id",
    split_config={
        "split_input": [
            {"name": "Introduction", "description": "Overview section"},
            {"name": "Financials", "description": "Financial data"}
        ]
    },
    async_=True
)
print(response.job_id)  # poll via client.jobs.get_job(job_id=...)

# Sync split
response = client.split(
    extraction_id="your-extraction-id",
    split_config={
        "split_input": [
            {"name": "Introduction", "description": "Overview section"},
            {"name": "Financials", "description": "Financial data"}
        ]
    }
)
print(response.split_id)
{
  "split_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "split_output": {
    "splits": {}
  }
}

Overview

Pipeline Step 2 (optional) — Split requires a prior extraction. After splitting, use schema extraction with split_id to apply per-topic schemas.
Identify which pages of a document contain each topic or section. The /split endpoint analyzes a saved extraction and uses AI to map pages to your defined topics. This is useful for:
  • Processing multi-section documents (e.g., annual reports, contracts)
  • Applying different schemas to different parts of a document
  • Organizing large documents by content type
This endpoint operates on saved extractions (created via /extract with storage enabled, which is the default).
To split many extractions at once, use Batch Split.

Async Mode

Set async: true to return immediately with a job ID for polling. See Polling for Results for details.
{
  "extraction_id": "abc123-def456",
  "split_config": { "split_input": [...] },
  "async": true
}

Request

Request Body

FieldTypeRequiredDescription
extraction_iduuidYesID of the saved extraction to split
split_configobjectXORInline split configuration with topics
split_config_iduuidXORReference to a saved split configuration
asyncbooleanNoIf true, returns immediately with a job_id for polling. Default: false.

Inline Config (split_config)

FieldTypeRequiredDescription
split_config.split_inputarrayYesList of topics to identify. Also accepts legacy name topics for backward compatibility.
Each topic in the split_input array:
FieldTypeRequiredDescription
namestringYesUnique identifier for the topic
descriptionstringNoDescription of what content belongs to this topic

Response

Synchronous Response (200)

FieldTypeDescription
split_iduuidUnique identifier for this split result
split_outputobjectContains splits — a mapping of topic names to arrays of 1-indexed page numbers

Async Response (202)

FieldTypeDescription
job_idstringJob ID for polling
statusstring"pending"
messagestringHuman-readable description

Example Usage

Split with Inline Config

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

split_result = client.split(
    extraction_id="abc123-def456-ghi789",
    split_config={
        "split_input": [
            {
                "name": "financial_statements",
                "description": "Balance sheets, income statements, cash flow statements"
            },
            {
                "name": "executive_summary",
                "description": "Letter to shareholders, company overview, highlights"
            },
            {
                "name": "risk_factors",
                "description": "Risk disclosures, forward-looking statements"
            }
        ]
    }
)

print(f"Split ID: {split_result.split_id}")
for topic, pages in split_result.split_output.splits.items():
    print(f"  {topic}: pages {pages}")

Split with Saved Config Reference

split_result = client.split(
    extraction_id="abc123-def456-ghi789",
    split_config_id="config-uuid-456"
)

Example Response

{
  "split_id": "split-uuid-123",
  "split_output": {
    "splits": {
      "financial_statements": [15, 16, 17, 18, 19, 20],
      "executive_summary": [1, 2, 3, 4],
      "risk_factors": [25, 26, 27, 28, 29, 30]
    }
  }
}

Using Split Results

After splitting, use the split_id with the /schema endpoint (split mode) to apply per-topic schemas:
split_id = split_result.split_id

schema_result = client.schema(
    split_id=split_id,
    split_schema_config={
        "financial_statements": {
            "schema": {"type": "object", "properties": {"revenue": {"type": "number"}}},
            "schema_prompt": "Extract financial data"
        },
        "risk_factors": {
            "schema": {"type": "object", "properties": {"risk_description": {"type": "string"}}}
        }
    }
)

Error Responses

StatusErrorDescription
400Invalid requestMissing required fields or invalid topic format
401UnauthorizedInvalid or missing API key
404Extraction not foundThe extraction_id doesn’t exist or you don’t have access
429Rate limit exceededToo many requests
500Processing errorSplit processing failed

Best Practices

Topic names become keys in the response and are used with /schema (split mode). Use clear, descriptive names like financial_statements rather than section_1.
The description field helps the AI accurately identify relevant pages. Be specific about what content belongs to each topic.
For documents with many pages, set async: true to avoid request timeouts. See Polling for Results.

Authorizations

x-api-key
string
header
required

API key for authentication

Body

application/json

Request body for splitting a document into topics.

extraction_id
string<uuid>
required

ID of the saved extraction to split.

split_config
object

Inline split configuration with topics. Required if split_config_id is not provided.

split_config_id
string<uuid>

Reference to a saved split configuration.

async
boolean
default:false

If true, returns 202 with a job_id for polling via GET /job/{jobId}. Otherwise processes synchronously.

Response

Split result with page assignments (when async=false or omitted)

Result of document splitting with page assignments.

split_id
string<uuid>
required

Unique identifier for this split result. Use with /schema endpoint (split mode) to apply per-topic schemas.

split_output
object
required

Page assignments per topic.