Extract structured data from a saved extraction or split

curl --request POST \
  --url https://api.runpulse.com/schema \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "extraction_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "extraction_ids": [
    "3c90c3cc-0d44-4b50-8888-8dd25736052a"
  ],
  "split_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "schema_config": {
    "input_schema": {},
    "excel_template": "aSDinaTvuI8gbWludGxpZnk=",
    "schema_prompt": "<string>",
    "effort": false
  },
  "schema_config_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "split_schema_config": {},
  "async": false
}
'

{
  "schema_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "version": 2,
  "schema_output": {
    "values": {},
    "citations": {}
  },
  "extraction_ids": [
    "3c90c3cc-0d44-4b50-8888-8dd25736052a"
  ],
  "excel_output_url": "<string>",
  "credits_used": 123,
  "plan_info": {
    "tier": "<string>",
    "total_credits_used": 123,
    "pages_used": 1,
    "note": "<string>"
  }
}

Pipeline Steps

Schema Extraction

Apply schema extraction to a previously saved extraction. The mode is inferred from the input:

Single mode — Provide extraction_id + schema_config (or schema_config_id) to apply one schema to the entire document.

Multi-extraction mode — Provide a batch extract ID as extraction_id (auto-detected) or an explicit extraction_ids list. The content from all extractions is combined and the schema is applied to the composite. Citations use extraction_id-bb_id format to disambiguate across source documents.

Split mode — Provide split_id + split_schema_config to apply different schemas to different page groups from a prior /split call. Each topic can have its own schema, prompt, and effort setting.

Excel template mode — Provide excel_template (base64 .xlsx) in schema_config instead of input_schema. The schema is auto-generated from the template’s column headers, and a filled copy is returned as excel_output_url.

Creates a versioned schema record that can be retrieved later. Set async: true to return immediately with a job_id for polling.

To apply schemas across many extractions or splits at once, see Batch Schema or the Batch Processing guide.

POST

schema

Extract structured data from a saved extraction or split

curl --request POST \
  --url https://api.runpulse.com/schema \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "extraction_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "extraction_ids": [
    "3c90c3cc-0d44-4b50-8888-8dd25736052a"
  ],
  "split_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "schema_config": {
    "input_schema": {},
    "excel_template": "aSDinaTvuI8gbWludGxpZnk=",
    "schema_prompt": "<string>",
    "effort": false
  },
  "schema_config_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "split_schema_config": {},
  "async": false
}
'

{
  "schema_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "version": 2,
  "schema_output": {
    "values": {},
    "citations": {}
  },
  "extraction_ids": [
    "3c90c3cc-0d44-4b50-8888-8dd25736052a"
  ],
  "excel_output_url": "<string>",
  "credits_used": 123,
  "plan_info": {
    "tier": "<string>",
    "total_credits_used": 123,
    "pages_used": 1,
    "note": "<string>"
  }
}

Overview

Pipeline Step 2 or 3 — Schema requires a prior extraction. For split mode, it also requires a prior split. The mode is inferred from the input fields you provide.

Apply a schema to previously extracted documents to get structured data output. This endpoint supports multiple modes, inferred from the input:

Single mode — provide extraction_id to apply one schema to a single document
Multi-extraction mode — provide a batch extract ID as extraction_id (auto-detected) or an explicit extraction_ids list to combine content from multiple documents and apply the schema to the composite
Split mode — provide split_id to apply per-topic schemas to page groups from a prior /split
Excel template mode — provide excel_template (base64 .xlsx) in schema_config instead of input_schema to auto-generate the schema from column headers and receive a filled Excel file

This endpoint operates on saved extractions (created via /extract with storage enabled, which is the default).

To apply schemas across many extractions or splits at once, use Batch Schema. It supports both single and split modes.

Async Mode

Set async: true to return immediately with a job ID for polling. See Polling for Results for details.

Field	Type	Required	Description
`async`	boolean	No	If `true`, returns immediately with a `job_id` for polling. Default: `false`.

Async Response (200):

Field	Type	Description
`job_id`	string	Job ID for polling
`status`	string	`"pending"`
`message`	string	Human-readable description

Mode Reference

Single Mode
Split Mode

Request

Apply one schema to an entire extraction.

Field	Type	Required	Description
`extraction_id`	uuid	Yes	ID of a saved extraction, or a batch extract job ID (auto-detected — see Multi-Extraction Mode)
`extraction_ids`	uuid[]	No	Explicit list of extraction IDs to combine (see Multi-Extraction Mode)
`schema_config`	object	XOR	Inline schema (see Schema Config)
`schema_config_id`	uuid	XOR	Reference to a saved schema configuration
`async`	boolean	No	Default: `false`

Schema Config

Provide either input_schema or excel_template — not both.

Field	Type	Required	Description
`input_schema`	object	XOR	JSON Schema defining the structured data to extract
`excel_template`	string (base64)	XOR	Base64-encoded `.xlsx` template — column headers are used to auto-generate the JSON Schema and a filled copy is returned (see Excel Template Mode)
`schema_prompt`	string	No	Natural language instructions to guide extraction
`effort`	boolean	No	Enable extended reasoning for complex documents

Response (200)

Field	Type	Description
`schema_id`	uuid	Unique identifier for this schema version
`version`	integer	Schema version number
`schema_output`	object	`{ values: {...}, citations: {...} }`
`extraction_ids`	uuid[]	Present when multiple extractions were combined — lists all source extraction IDs
`excel_output_url`	string	API path to download the filled Excel template (only present when `excel_template` was provided)

Example — Inline Schema

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

schema_result = client.schema(
    extraction_id="abc123-def456-ghi789",
    schema_config={
        "input_schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string"},
                "total_amount": {"type": "number"},
                "vendor_name": {"type": "string"},
                "line_items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "description": {"type": "string"},
                            "amount": {"type": "number"}
                        }
                    }
                }
            },
            "required": ["invoice_number", "total_amount"]
        },
        "schema_prompt": "Extract all invoice details including line items"
    }
)

print(schema_result.schema_output)

Example — Saved Config Reference

schema_result = client.schema(
    extraction_id="abc123-def456-ghi789",
    schema_config_id="config-uuid-123"
)

Example Response

{
  "schema_id": "schema-uuid-456",
  "version": 2,
  "schema_output": {
    "values": {
      "invoice_number": "INV-2024-001",
      "total_amount": 1250.00,
      "vendor_name": "Acme Corp",
      "line_items": [
        {"description": "Consulting Services", "amount": 1000.00},
        {"description": "Travel Expenses", "amount": 250.00}
      ]
    },
    "citations": {
      "invoice_number": {"page": 1, "bbox": [100, 50, 200, 70]},
      "total_amount": {"page": 1, "bbox": [400, 500, 500, 520]}
    }
  }
}

Multi-Extraction Mode

Combine content from multiple documents and apply a single schema to the composite, producing one merged result. This is useful when the data you need spans across several files (e.g., a loss summary in one file and exposure data in another).

This is different from Batch Schema, which applies the same schema to each document independently (one result per document). Use multi-extraction when you need to cross-reference or merge data from multiple source files into a single output.

There are two ways to trigger multi-extraction:

Batch extract auto-detection — Pass a batch extract batch_job_id as extraction_id. The system detects it as a batch parent and automatically combines all completed child extractions.
Explicit list — Pass an extraction_ids array with the specific extraction IDs to combine.

Citations in multi-extraction results use the extraction_id-bb_id format (e.g., abc123-txt-1) to disambiguate bounding boxes across source documents.

from pulse import Pulse
from pulse.types.schema_config import SchemaConfig

client = Pulse(api_key="YOUR_API_KEY")

# Option 1: Pass a batch extract job ID (auto-detected)
schema_result = client.schema(
    extraction_id="<batch_job_id>",
    schema_config=SchemaConfig(
        input_schema={
            "type": "object",
            "properties": {
                "policy_period": {"type": "string"},
                "exposure": {"type": "number"},
            },
        },
        schema_prompt="Combine data from both documents",
    ),
)

# Option 2: Pass an explicit list of extraction IDs
schema_result = client.schema(
    extraction_ids=["extraction-1-uuid", "extraction-2-uuid"],
    schema_config=SchemaConfig(
        input_schema={
            "type": "object",
            "properties": {
                "policy_period": {"type": "string"},
                "exposure": {"type": "number"},
            },
        },
    ),
)

# Response includes the list of source extraction IDs
print(schema_result.extraction_ids)

Excel Template Mode

Instead of writing a JSON Schema by hand, provide an Excel template (.xlsx) with the column headers you want filled. The system auto-generates the JSON Schema from the template’s structure, applies it to the extraction, and returns a filled copy of the original template.

import base64
from pulse import Pulse
from pulse.types.schema_config import SchemaConfig

client = Pulse(api_key="YOUR_API_KEY")

with open("template.xlsx", "rb") as f:
    template_b64 = base64.b64encode(f.read()).decode()

schema_result = client.schema(
    extraction_id="abc123-def456-ghi789",
    schema_config=SchemaConfig(
        excel_template=template_b64,
        schema_prompt="Extract policy period data into the template columns",
    ),
)

# Download the filled Excel file
excel_bytes = client.download_schema_excel(schema_result.schema_id)
with open("filled_output.xlsx", "wb") as f:
    for chunk in excel_bytes:
        f.write(chunk)

The response includes excel_output_url (e.g., /schema/{schema_id}/excel) — an authenticated API path for downloading the filled template. Use client.download_schema_excel(schema_id) in the SDK or make an authenticated GET request.

Excel template mode and multi-extraction mode can be combined — pass a batch extract ID or extraction_ids along with excel_template to fill a template from multiple source documents.

Request

Apply different schemas to different page groups from a prior /split call.

Field	Type	Required	Description
`split_id`	uuid	Yes	ID from a prior `/split` call
`split_schema_config`	object	Yes	Per-topic schema configs (keys = topic names from split)
`async`	boolean	No	Default: `false`

Each topic in split_schema_config:

Field	Type	Required	Description
`schema`	object	XOR	JSON Schema for this topic
`schema_prompt`	string	No	Additional extraction instructions
`effort`	boolean	No	Enable extended reasoning
`schema_config_id`	uuid	XOR	Reference to a saved schema config

Response (200)

Field	Type	Description
`schema_id`	uuid	Unique identifier for this schema version
`split_id`	uuid	ID of the split that defined the page groups
`results`	object	Per-topic results: `{ "topic": { values: {...}, citations: {...} } }`
`input_schemas`	object	Echo of the schemas applied, keyed by topic
`errors`	object	Optional: per-topic errors if any topics failed

Example — Per-Topic Schemas

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

schema_result = client.schema(
    split_id="split-uuid-123",
    split_schema_config={
        "financial_statements": {
            "schema": {
                "type": "object",
                "properties": {
                    "revenue": {"type": "number"},
                    "expenses": {"type": "number"},
                    "net_income": {"type": "number"}
                }
            },
            "schema_prompt": "Extract financial data from statements"
        },
        "signatures": {
            "schema": {
                "type": "object",
                "properties": {
                    "signee_name": {"type": "string"},
                    "date_signed": {"type": "string"}
                }
            }
        }
    }
)

for topic, result in schema_result.results.items():
    print(f"{topic}: {result}")

Example Response

{
  "schema_id": "schema-uuid-789",
  "split_id": "split-uuid-123",
  "results": {
    "financial_statements": {
      "values": { "revenue": 5000000, "expenses": 3200000 },
      "citations": { "revenue": {"page": 15, "bbox": [100, 200, 300, 220]} }
    },
    "signatures": {
      "values": { "signee_name": "Jane Doe", "date_signed": "2024-01-15" },
      "citations": {}
    }
  },
  "input_schemas": {
    "financial_statements": { "type": "object", "properties": { "revenue": { "type": "number" } } },
    "signatures": { "type": "object", "properties": { "signee_name": { "type": "string" } } }
  }
}

Download Filled Excel — `GET /schema/{schemaId}/excel`

When a schema extraction was created with excel_template, the filled output can be downloaded from this authenticated endpoint. Requires the same API key used for other endpoints. The caller must belong to the org that owns the underlying extraction.

Status	Description
200	Returns the filled `.xlsx` file as a binary download
401	Authentication failed or missing API key
404	Schema not found, or no Excel output (was `excel_template` provided in the original request?)

Error Responses

Status	Error	Description
400	Invalid request	Must provide `extraction_id`, `extraction_ids`, or `split_id`
400	Invalid schema	Schema must follow JSON Schema / OpenAPI 3.0 format
400	Mutually exclusive	Cannot provide both `input_schema` and `excel_template`
401	Unauthorized	Invalid or missing API key
404	Not found	Extraction, batch job, or split not found
500	Processing error	Schema extraction failed

Best Practices

Use effort mode for complex documents

Set effort: true for documents with complex layouts, tables, or when initial extraction quality is low.

Provide schema_prompt for context

Add natural language instructions to guide the extraction, especially for ambiguous fields.

Use async for large schemas

If your schema has many fields or the document is large, set async: true to avoid timeouts. See Polling for Results.

For multi-section documents, use split mode

First call /split to get page groups, then use this endpoint with split_id + split_schema_config.

Combine data from multiple files with multi-extraction

When the data you need spans multiple documents, use Batch Extract to extract all files, then pass the batch_job_id as extraction_id to this endpoint. The system auto-detects the batch parent and combines content from all child extractions.

Use Excel templates for spreadsheet-native workflows

If your output is an Excel spreadsheet, skip the JSON Schema definition and provide the empty .xlsx template directly via excel_template. The column headers define the schema, and you get a filled copy back via excel_output_url.

Extract

Extract content from a document

Split Document

Split a document into topic-based page groups

Batch Processing

Apply schema across many documents in parallel

Authorizations

x-api-key

string

header

required

Body

application/json

Request body for schema extraction. Mode is inferred from the input:

Provide extraction_id for single-mode or multi-extraction (auto-detected). If the ID belongs to a batch extract, its child extractions are combined automatically.
Provide extraction_ids for an explicit list of extractions to combine.
Provide split_id + split_schema_config for split-mode extraction.

extraction_id

string<uuid>

ID of a saved extraction OR a batch extract job. When a batch extract ID is provided, the system auto-detects it and combines all completed child extractions into a single schema application.

extraction_ids

string<uuid>[]

Explicit list of extraction IDs to combine. The markdown and bounding boxes from all extractions are merged and the schema is applied to the composite content. Citations use extraction_id-bb_id format to disambiguate across source documents.

split_id

string<uuid>

ID of saved split (from a prior /split call). Use for split-mode schema extraction.

schema_config

object

Inline schema configuration for single mode. Required (with extraction_id) if schema_config_id is not provided.

Show child attributes

schema_config_id

string<uuid>

Reference to a saved schema configuration for single mode. Use this instead of providing schema_config inline.

split_schema_config

object

Per-topic schema configurations for split mode. Keys must match the topic names from the split. Each topic provides either inline schema or schema_config_id.

Show child attributes

async

boolean

default:false

If true, returns immediately with a job_id for polling via GET /job/{jobId}. Otherwise processes synchronously.

Response

Schema extraction result (when async=false or omitted). Shape depends on the mode used.

Option 1
Option 2

Response for single schema extraction mode.

schema_id

string<uuid>

required

Unique identifier for this schema version.

version

integer

required

Version number of this schema for the extraction.

Required range: x >= 1

schema_output

object

required

Extracted values and citations.

Show child attributes

extraction_ids

string<uuid>[]

Present when multiple extractions were combined (via batch extract auto-detection or explicit extraction_ids input). Lists all source extraction IDs that contributed to the result.

excel_output_url

string

API path to download the filled Excel template (e.g. /schema/{schema_id}/excel). Requires the same API key authentication. Only present when excel_template was provided in the request.

credits_used

number<float> | null

Number of credits consumed by this request. Only present when the organization has the credit billing system enabled.

plan_info

object

Billing tier and cumulative usage information for the calling org, including this schema run.

Show child attributes

Extract File Tables

⌘I

​Overview

​Async Mode

​Mode Reference

​Request

​Schema Config

​Response (200)

​Example — Inline Schema

​Example — Saved Config Reference

​Example Response

​Multi-Extraction Mode

​Excel Template Mode

​Request

​Response (200)

​Example — Per-Topic Schemas

​Example Response

​Download Filled Excel — GET /schema/{schemaId}/excel

​Error Responses

​Best Practices

​Related Endpoints

Extract

Split Document

Batch Processing

Authorizations

Body

Response

Overview

Async Mode

Mode Reference

Request

Schema Config

Response (200)

Example — Inline Schema

Example — Saved Config Reference

Example Response

Multi-Extraction Mode

Excel Template Mode

Request

Response (200)

Example — Per-Topic Schemas

Example Response

Download Filled Excel — `GET /schema/{schemaId}/excel`

Error Responses

Best Practices

Related Endpoints