Skip to main content
POST
/
schema
Python SDK
from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

# Single mode — apply schema to entire document
response = client.schema.extract_schema(
    extraction_id="your-extraction-id",
    schema_config={
        "schema": {
            "type": "object",
            "properties": {
                "total": {"type": "number"},
                "vendor": {"type": "string"}
            }
        },
        "schema_prompt": "Extract invoice total and vendor"
    }
)
print(response.schema_output)

# Split mode — different schemas per topic
response = client.schema.extract_schema(
    split_id="your-split-id",
    split_schema_config={
        "Introduction": {
            "schema": {"type": "object", "properties": {"summary": {"type": "string"}}},
            "schema_prompt": "Summarize the introduction"
        },
        "Financials": {
            "schema": {"type": "object", "properties": {"revenue": {"type": "number"}}},
            "schema_prompt": "Extract financial figures"
        }
    }
)
print(response.results)
{
  "schema_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "version": 2,
  "schema_output": {
    "values": {},
    "citations": {}
  }
}

Overview

Pipeline Step 2 or 3 — Schema requires a prior extraction. For split mode, it also requires a prior split. The mode is inferred from whether you provide extraction_id (single) or split_id (split).
Apply a schema to a previously extracted document to get structured data output. This endpoint supports two modes, inferred from the input:
  • Single mode — provide extraction_id to apply one schema to the entire document
  • Split mode — provide split_id to apply per-topic schemas to page groups from a prior /split
This endpoint operates on saved extractions (created via /extract with storage enabled, which is the default).

Async Mode

Set async: true to return immediately with a job ID for polling. See Polling for Results for details.
FieldTypeRequiredDescription
asyncbooleanNoIf true, returns immediately with a job_id for polling. Default: false.
Async Response (200):
FieldTypeDescription
job_idstringJob ID for polling
statusstring"pending"
messagestringHuman-readable description

Mode Reference

Request

Apply one schema to an entire extraction.
FieldTypeRequiredDescription
extraction_iduuidYesID of the saved extraction
schema_configobjectXORInline schema: { schema: {...}, schema_prompt: "...", effort: false }
schema_config_iduuidXORReference to a saved schema configuration
asyncbooleanNoDefault: false

Response (200)

FieldTypeDescription
schema_iduuidUnique identifier for this schema version
versionintegerSchema version number
schema_outputobject{ values: {...}, citations: {...} }

Example — Inline Schema

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

schema_result = client.schema(
    extraction_id="abc123-def456-ghi789",
    schema_config={
        "schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string"},
                "total_amount": {"type": "number"},
                "vendor_name": {"type": "string"},
                "line_items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "description": {"type": "string"},
                            "amount": {"type": "number"}
                        }
                    }
                }
            },
            "required": ["invoice_number", "total_amount"]
        },
        "schema_prompt": "Extract all invoice details including line items"
    }
)

print(schema_result.schema_output)

Example — Saved Config Reference

schema_result = client.schema(
    extraction_id="abc123-def456-ghi789",
    schema_config_id="config-uuid-123"
)

Example Response

{
  "schema_id": "schema-uuid-456",
  "version": 2,
  "schema_output": {
    "values": {
      "invoice_number": "INV-2024-001",
      "total_amount": 1250.00,
      "vendor_name": "Acme Corp",
      "line_items": [
        {"description": "Consulting Services", "amount": 1000.00},
        {"description": "Travel Expenses", "amount": 250.00}
      ]
    },
    "citations": {
      "invoice_number": {"page": 1, "bbox": [100, 50, 200, 70]},
      "total_amount": {"page": 1, "bbox": [400, 500, 500, 520]}
    }
  }
}

Error Responses

StatusErrorDescription
400Invalid requestMust provide either extraction_id or split_id (not both)
400Invalid schemaSchema must follow JSON Schema / OpenAPI 3.0 format
401UnauthorizedInvalid or missing API key
404Not foundExtraction or split not found
500Processing errorSchema extraction failed

Best Practices

Set effort: true for documents with complex layouts, tables, or when initial extraction quality is low.
Add natural language instructions to guide the extraction, especially for ambiguous fields.
If your schema has many fields or the document is large, set async: true to avoid timeouts. See Polling for Results.
First call /split to get page groups, then use this endpoint with split_id + split_schema_config.

Authorizations

x-api-key
string
header
required

API key for authentication

Body

application/json

Request body for schema extraction. Mode is inferred from the input:

  • Provide extraction_id + schema_config for single-mode extraction.
  • Provide split_id + split_schema_config for split-mode extraction.
extraction_id
string<uuid>

ID of saved extraction (for single mode).

split_id
string<uuid>

ID of saved split (for split mode).

schema_config
object

Inline schema configuration for single mode.

schema_config_id
string<uuid>

Reference to a saved schema configuration (for single mode).

split_schema_config
object

Per-topic schema configurations for split mode. Keys must match topic names from the split.

async
boolean
default:false

If true, returns 202 with a job_id for polling via GET /job/{jobId}.

Response

Schema extraction result (when async=false or omitted). Shape depends on the mode used (single vs split).

Response for single schema extraction mode.

schema_id
string<uuid>
required

Unique identifier for this schema version.

version
integer
required

Version number of this schema for the extraction.

Required range: x >= 1
schema_output
object
required

Extracted values and citations.