Apply schema extraction to a previously saved extraction. The mode is inferred from the input:
Single mode — Provide extraction_id + schema_config (or
schema_config_id) to apply one schema to the entire document.
Multi-extraction mode — Provide a batch extract ID as extraction_id
(auto-detected) or an explicit extraction_ids list. The content from all
extractions is combined and the schema is applied to the composite. Citations
use extraction_id-bb_id format to disambiguate across source documents.
Split mode — Provide split_id + split_schema_config to apply
different schemas to different page groups from a prior /split call.
Excel template mode — Provide excel_template (base64 .xlsx) in
schema_config instead of input_schema. The schema is auto-generated
from the template’s column headers, and a filled copy is returned as
excel_output_url.
Set async: true to return 202 with a job_id for polling.
extraction_id to apply one schema to a single documentextraction_id (auto-detected) or an explicit extraction_ids list to combine content from multiple documents and apply the schema to the compositesplit_id to apply per-topic schemas to page groups from a prior /splitexcel_template (base64 .xlsx) in schema_config instead of input_schema to auto-generate the schema from column headers and receive a filled Excel file/extract with storage enabled, which is the default).async: true to return immediately with a job ID for polling. See Polling for Results for details.
| Field | Type | Required | Description |
|---|---|---|---|
async | boolean | No | If true, returns immediately with a job_id for polling. Default: false. |
| Field | Type | Description |
|---|---|---|
job_id | string | Job ID for polling |
status | string | "pending" |
message | string | Human-readable description |
| Field | Type | Required | Description |
|---|---|---|---|
extraction_id | uuid | Yes | ID of a saved extraction, or a batch extract job ID (auto-detected — see Multi-Extraction Mode) |
extraction_ids | uuid[] | No | Explicit list of extraction IDs to combine (see Multi-Extraction Mode) |
schema_config | object | XOR | Inline schema (see Schema Config) |
schema_config_id | uuid | XOR | Reference to a saved schema configuration |
async | boolean | No | Default: false |
input_schema or excel_template — not both.| Field | Type | Required | Description |
|---|---|---|---|
input_schema | object | XOR | JSON Schema defining the structured data to extract |
excel_template | string (base64) | XOR | Base64-encoded .xlsx template — column headers are used to auto-generate the JSON Schema and a filled copy is returned (see Excel Template Mode) |
schema_prompt | string | No | Natural language instructions to guide extraction |
effort | boolean | No | Enable extended reasoning for complex documents |
| Field | Type | Description |
|---|---|---|
schema_id | uuid | Unique identifier for this schema version |
version | integer | Schema version number |
schema_output | object | { values: {...}, citations: {...} } |
extraction_ids | uuid[] | Present when multiple extractions were combined — lists all source extraction IDs |
excel_output_url | string | API path to download the filled Excel template (only present when excel_template was provided) |
batch_job_id as extraction_id. The system detects it as a batch parent and automatically combines all completed child extractions.extraction_ids array with the specific extraction IDs to combine.extraction_id-bb_id format (e.g., abc123-txt-1) to disambiguate bounding boxes across source documents..xlsx) with the column headers you want filled. The system auto-generates the JSON Schema from the template’s structure, applies it to the extraction, and returns a filled copy of the original template.excel_output_url (e.g., /schema/{schema_id}/excel) — an authenticated API path for downloading the filled template. Use client.download_schema_excel(schema_id) in the SDK or make an authenticated GET request.extraction_ids along with excel_template to fill a template from multiple source documents.GET /schema/{schemaId}/excelexcel_template, the filled output can be downloaded from this authenticated endpoint. Requires the same API key used for other endpoints. The caller must belong to the org that owns the underlying extraction.
| Status | Description |
|---|---|
| 200 | Returns the filled .xlsx file as a binary download |
| 401 | Authentication failed or missing API key |
| 404 | Schema not found, or no Excel output (was excel_template provided in the original request?) |
| Status | Error | Description |
|---|---|---|
| 400 | Invalid request | Must provide extraction_id, extraction_ids, or split_id |
| 400 | Invalid schema | Schema must follow JSON Schema / OpenAPI 3.0 format |
| 400 | Mutually exclusive | Cannot provide both input_schema and excel_template |
| 401 | Unauthorized | Invalid or missing API key |
| 404 | Not found | Extraction, batch job, or split not found |
| 500 | Processing error | Schema extraction failed |
Use effort mode for complex documents
effort: true for documents with complex layouts, tables, or when initial extraction quality is low.Provide schema_prompt for context
Use async for large schemas
async: true to avoid timeouts. See Polling for Results.For multi-section documents, use split mode
/split to get page groups, then use this endpoint with split_id + split_schema_config.Combine data from multiple files with multi-extraction
batch_job_id as extraction_id to this endpoint. The system auto-detects the batch parent and combines content from all child extractions.Use Excel templates for spreadsheet-native workflows
.xlsx template directly via excel_template. The column headers define the schema, and you get a filled copy back via excel_output_url.API key for authentication
Request body for schema extraction. Mode is inferred from the input:
extraction_id for single-mode or multi-extraction (auto-detected).
If the ID belongs to a batch extract, its child extractions are combined automatically.extraction_ids for an explicit list of extractions to combine.split_id + split_schema_config for split-mode extraction.ID of a saved extraction OR a batch extract job. When a batch extract ID is provided, the system auto-detects it and combines all completed child extractions into a single schema application.
Explicit list of extraction IDs to combine. The markdown and bounding boxes from all extractions are merged and the schema is applied to the composite content. Citations use extraction_id-bb_id format to disambiguate across source documents.
ID of saved split (for split mode).
Inline schema configuration for single mode. Provide input_schema (JSON Schema) OR excel_template (base64 .xlsx) — not both.
Reference to a saved schema configuration (for single mode).
Per-topic schema configurations for split mode. Keys must match topic names from the split.
If true, returns 202 with a job_id for polling via GET /job/{jobId}.
Schema extraction result (when async=false or omitted). Shape depends on the mode used (single vs split).
Response for single schema extraction mode.
Unique identifier for this schema version.
Version number of this schema for the extraction.
x >= 1Extracted values and citations.
Present when multiple extractions were combined (via batch extract auto-detection or explicit extraction_ids input). Lists all source extraction IDs that contributed to the result.
API path to download the filled Excel template (e.g. /schema/{schema_id}/excel). Requires the same API key authentication. Only present when excel_template was provided in the request.