Skip to main content
The structured_output parameter on /extract is deprecated and will be removed in a future version. Please migrate to the two-step extract → schema flow described below.

Overview

Previously, you could pass a structured_output object directly in your /extract request to get structured data in a single call. This approach has been replaced by a dedicated /schema endpoint that runs after extraction.

Why the change?

BenefitDescription
Re-runnabilityRe-apply different schemas to the same extraction without re-processing the document
Split-mode schemasApply different schemas to different sections of a document using /split/schema
Separation of concernsExtraction and schema application are independent steps — easier to debug and optimize
Async supportThe /schema endpoint supports its own async: true flag for long-running schema jobs
Cost savingsOnly pay for extraction once, then iterate on schemas without additional extraction charges

Before vs. After

Before (Deprecated)After (Recommended)
Steps1 call to /extract2 calls: /extract/schema
Schema paramstructured_output on /extractschema_config on /schema
Response fieldresponse.structured_outputresponse.schema_output
Re-run schemaMust re-extract entire documentCall /schema again with same extraction_id
Split-modeNot supportedSupported via /split/schema

Migration Steps

1
Step 1: Remove structured_output from your /extract call
2
Strip the structured_output, schema, and schema_prompt parameters from your extraction request. Keep all other parameters (pages, figure_processing, extensions, etc.) as-is.
3
Step 2: Save the extraction_id from the response
4
The /extract response includes an extraction_id (when storage is enabled, which is the default). Store this ID — you’ll need it for the schema step.
5
Step 3: Call /schema with your schema config
6
Send a POST /schema request with the extraction_id and your schema in the schema_config object. The schema format is the same JSON Schema you were using before.
7
Step 4: Update response handling
8
The schema result is now in response.schema_output (instead of response.structured_output). The shape includes values and citations.

Code Examples

Python SDK

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

schema = {
    "type": "object",
    "properties": {
        "account_holder": {"type": "string"},
        "account_number": {"type": "string"},
        "closing_balance": {"type": "number"}
    },
    "required": ["account_holder"]
}

# Old way — schema bundled with extraction
response = client.extract(
    file_url="https://example.com/bank_statement.pdf",
    structured_output={
        "schema": schema,
        "schema_prompt": "Extract bank statement details"
    }
)

# Structured data was in the extract response
print(response.structured_output["account_holder"])
print(response.structured_output["closing_balance"])

TypeScript SDK

import { PulseClient } from "pulse-ts-sdk";

const client = new PulseClient({ apiKey: "YOUR_API_KEY" });

const schema = {
    type: "object",
    properties: {
        account_holder: { type: "string" },
        account_number: { type: "string" },
        closing_balance: { type: "number" }
    },
    required: ["account_holder"]
};

// Old way — schema bundled with extraction
const response = await client.extract({
    fileUrl: "https://example.com/bank_statement.pdf",
    structuredOutput: {
        schema,
        schemaPrompt: "Extract bank statement details"
    }
});

// Structured data was in the extract response
console.log(response.structured_output?.values?.account_holder);
console.log(response.structured_output?.values?.closing_balance);

curl

curl -X POST https://api.runpulse.com/extract \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@bank_statement.pdf" \
  -F 'structured_output={"schema": {"type": "object", "properties": {"account_holder": {"type": "string"}, "closing_balance": {"type": "number"}}}, "schema_prompt": "Extract bank statement details"}'

With Async Processing

If you were using structured_output with async extraction, here’s the updated flow:
import time

# Step 1: Async extraction
extract_response = client.extract(
    file_url="https://example.com/large_document.pdf",
    async_=True
)

# Step 2: Poll for extraction completion
job_id = extract_response.job_id
while True:
    status = client.jobs.get_job(job_id=job_id)
    if status.status == "completed":
        extraction_id = status.result.extraction_id
        break
    elif status.status in ["failed", "canceled"]:
        raise Exception(f"Extraction failed: {status.status}")
    time.sleep(2)

# Step 3: Apply schema (can also be async)
schema_response = client.schema(
    extraction_id=extraction_id,
    schema_config={
        "input_schema": {
            "type": "object",
            "properties": {
                "account_holder": {"type": "string"},
                "closing_balance": {"type": "number"}
            }
        }
    }
)

print(schema_response.schema_output)

Advanced: Split-Mode Schema (New Capability)

With the new flow, you can now split a document into topics and apply different schemas to each section. This was not possible with the old structured_output approach.
Python
# Step 1: Extract
extract_response = client.extract(
    file_url="https://example.com/annual_report.pdf"
)

# Step 2: Split into topics
split_response = client.split(
    extraction_id=extract_response.extraction_id,
    split_config={
        "split_input": [
            {"name": "Financials", "description": "Revenue, expenses, profit data"},
            {"name": "Leadership", "description": "Executive team and board info"},
            {"name": "Outlook", "description": "Future plans and projections"}
        ]
    }
)

# Step 3: Apply different schemas to each topic
schema_response = client.schema(
    split_id=split_response.split_id,
    split_schema_config={
        "Financials": {
            "schema": {
                "type": "object",
                "properties": {
                    "revenue": {"type": "number"},
                    "net_income": {"type": "number"},
                    "yoy_growth": {"type": "string"}
                }
            },
            "schema_prompt": "Extract key financial metrics"
        },
        "Leadership": {
            "schema": {
                "type": "object",
                "properties": {
                    "ceo": {"type": "string"},
                    "board_members": {"type": "array", "items": {"type": "string"}}
                }
            },
            "schema_prompt": "Extract leadership information"
        },
        "Outlook": {
            "schema": {
                "type": "object",
                "properties": {
                    "guidance": {"type": "string"},
                    "key_initiatives": {"type": "array", "items": {"type": "string"}}
                }
            },
            "schema_prompt": "Extract forward-looking statements"
        }
    }
)

# Results are organized by topic
for topic, result in schema_response.results.items():
    print(f"\n--- {topic} ---")
    print(result)

Response Field Mapping

If you’re parsing the response, here’s how the fields map:
Old field (on /extract response)New field (on /schema response)Notes
response.structured_outputresponse.schema_output.valuesValues are now nested under schema_output.values
response.structured_output.citationsresponse.schema_output.citationsSame structure, new path
response.input_schemaN/AEcho removed; you know what schema you sent
response.schema_errorStandard HTTP error responseErrors are returned as 4xx/5xx responses

FAQ

No. The parameter will continue to work for backward compatibility but will be removed in a future version. We recommend migrating as soon as possible.
Yes. The JSON Schema format is identical. The only change is where you send it: instead of structured_output.schema on /extract, you send schema_config.input_schema on /schema. The schema_prompt field is also in the same location within the config object.
No. Extraction is billed once based on page count. The /schema endpoint runs on the already-extracted content and does not incur additional extraction charges.
Yes! Set async: true on the /schema request to get a job_id and poll for results, just like extraction.
Then you’re already using the recommended flow. Just call /extract without any schema parameters and use the markdown from the response.

Schema Endpoint

Full reference for the /schema endpoint (single + split mode)

Schema Design Guide

Best practices for writing effective JSON schemas

Pipeline Overview

How extract → split → schema work together

Split Endpoint

Split documents into topics for targeted schema extraction