Identify which pages of a document contain each topic/section. Takes an existing extraction and a list of topics, then uses AI to identify which PDF pages contain content related to each topic.
The result is persisted with a split_id that can be used with the
/schema endpoint (split mode) for targeted schema extraction on
specific page groups.
Set async: true to return 202 with a job_id for polling.
split_id to apply per-topic schemas./split endpoint analyzes a saved extraction and uses AI to map pages to your defined topics. This is useful for:
/extract with storage enabled, which is the default).async: true to return immediately with a job ID for polling. See Polling for Results for details.
| Field | Type | Required | Description |
|---|---|---|---|
extraction_id | uuid | Yes | ID of the saved extraction to split |
split_config | object | XOR | Inline split configuration with topics |
split_config_id | uuid | XOR | Reference to a saved split configuration |
async | boolean | No | If true, returns immediately with a job_id for polling. Default: false. |
split_config)| Field | Type | Required | Description |
|---|---|---|---|
split_config.split_input | array | Yes | List of topics to identify. Also accepts legacy name topics for backward compatibility. |
split_input array:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier for the topic |
description | string | No | Description of what content belongs to this topic |
| Field | Type | Description |
|---|---|---|
split_id | uuid | Unique identifier for this split result |
split_output | object | Contains splits — a mapping of topic names to arrays of 1-indexed page numbers |
| Field | Type | Description |
|---|---|---|
job_id | string | Job ID for polling |
status | string | "pending" |
message | string | Human-readable description |
split_id with the /schema endpoint (split mode) to apply per-topic schemas:
| Status | Error | Description |
|---|---|---|
| 400 | Invalid request | Missing required fields or invalid topic format |
| 401 | Unauthorized | Invalid or missing API key |
| 404 | Extraction not found | The extraction_id doesn’t exist or you don’t have access |
| 429 | Rate limit exceeded | Too many requests |
| 500 | Processing error | Split processing failed |
Use descriptive topic names
/schema (split mode). Use clear, descriptive names like financial_statements rather than section_1.Provide detailed descriptions
Use async for large documents
async: true to avoid request timeouts. See Polling for Results.API key for authentication
Request body for splitting a document into topics.
ID of the saved extraction to split.
Inline split configuration with topics. Required if split_config_id is not provided.
Reference to a saved split configuration.
If true, returns 202 with a job_id for polling via GET /job/{jobId}. Otherwise processes synchronously.
Split result with page assignments (when async=false or omitted)