Migrate from structured_output to /schema

The structured_output parameter on /extract is deprecated and will be removed in a future version. Please migrate to the two-step extract → schema flow described below.

Overview

Previously, you could pass a structured_output object directly in your /extract request to get structured data in a single call. This approach has been replaced by a dedicated /schema endpoint that runs after extraction.

Why the change?

Benefit	Description
Re-runnability	Re-apply different schemas to the same extraction without re-processing the document
Split-mode schemas	Apply different schemas to different sections of a document using `/split` → `/schema`
Separation of concerns	Extraction and schema application are independent steps — easier to debug and optimize
Async support	The `/schema` endpoint supports its own `async: true` flag for long-running schema jobs
Cost savings	Only pay for extraction once, then iterate on schemas without additional extraction charges

Before vs. After

	Before (Deprecated)	After (Recommended)
Steps	1 call to `/extract`	2 calls: `/extract` → `/schema`
Schema param	`structured_output` on `/extract`	`schema_config` on `/schema`
Response field	`response.structured_output`	`response.schema_output`
Re-run schema	Must re-extract entire document	Call `/schema` again with same `extraction_id`
Split-mode	Not supported	Supported via `/split` → `/schema`

Migration Steps

Step 1: Remove structured_output from your /extract call

Strip the structured_output, schema, and schema_prompt parameters from your extraction request. Keep all other parameters (pages, figure_processing, extensions, etc.) as-is.

Step 2: Save the extraction_id from the response

The /extract response includes an extraction_id (when storage is enabled, which is the default). Store this ID — you’ll need it for the schema step.

Step 3: Call /schema with your schema config

Send a POST /schema request with the extraction_id and your schema in the schema_config object. The schema format is the same JSON Schema you were using before.

Step 4: Update response handling

The schema result is now in response.schema_output (instead of response.structured_output). The shape includes values and citations.

Code Examples

Python SDK

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

schema = {
    "type": "object",
    "properties": {
        "account_holder": {"type": "string"},
        "account_number": {"type": "string"},
        "closing_balance": {"type": "number"}
    },
    "required": ["account_holder"]
}

# Old way — schema bundled with extraction
response = client.extract(
    file_url="https://example.com/bank_statement.pdf",
    structured_output={
        "schema": schema,
        "schema_prompt": "Extract bank statement details"
    }
)

# Structured data was in the extract response
print(response.structured_output["account_holder"])
print(response.structured_output["closing_balance"])

TypeScript SDK

import { PulseClient } from "pulse-ts-sdk";

const client = new PulseClient({ apiKey: "YOUR_API_KEY" });

const schema = {
    type: "object",
    properties: {
        account_holder: { type: "string" },
        account_number: { type: "string" },
        closing_balance: { type: "number" }
    },
    required: ["account_holder"]
};

// Old way — schema bundled with extraction
const response = await client.extract({
    fileUrl: "https://example.com/bank_statement.pdf",
    structuredOutput: {
        schema,
        schemaPrompt: "Extract bank statement details"
    }
});

// Structured data was in the extract response
console.log(response.structured_output?.values?.account_holder);
console.log(response.structured_output?.values?.closing_balance);

curl

curl -X POST https://api.runpulse.com/extract \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@bank_statement.pdf" \
  -F 'structured_output={"schema": {"type": "object", "properties": {"account_holder": {"type": "string"}, "closing_balance": {"type": "number"}}}, "schema_prompt": "Extract bank statement details"}'

With Async Processing

If you were using structured_output with async extraction, here’s the updated flow:

import time

# Step 1: Async extraction
extract_response = client.extract(
    file_url="https://example.com/large_document.pdf",
    async_=True
)

# Step 2: Poll for extraction completion
job_id = extract_response.job_id
while True:
    status = client.jobs.get_job(job_id=job_id)
    if status.status == "completed":
        extraction_id = status.result.extraction_id
        break
    elif status.status in ["failed", "canceled"]:
        raise Exception(f"Extraction failed: {status.status}")
    time.sleep(2)

# Step 3: Apply schema (can also be async)
schema_response = client.schema(
    extraction_id=extraction_id,
    schema_config={
        "input_schema": {
            "type": "object",
            "properties": {
                "account_holder": {"type": "string"},
                "closing_balance": {"type": "number"}
            }
        }
    }
)

print(schema_response.schema_output)

Advanced: Split-Mode Schema (New Capability)

With the new flow, you can now split a document into topics and apply different schemas to each section. This was not possible with the old structured_output approach.

Python

# Step 1: Extract
extract_response = client.extract(
    file_url="https://example.com/annual_report.pdf"
)

# Step 2: Split into topics
split_response = client.split(
    extraction_id=extract_response.extraction_id,
    split_config={
        "split_input": [
            {"name": "Financials", "description": "Revenue, expenses, profit data"},
            {"name": "Leadership", "description": "Executive team and board info"},
            {"name": "Outlook", "description": "Future plans and projections"}
        ]
    }
)

# Step 3: Apply different schemas to each topic
schema_response = client.schema(
    split_id=split_response.split_id,
    split_schema_config={
        "Financials": {
            "schema": {
                "type": "object",
                "properties": {
                    "revenue": {"type": "number"},
                    "net_income": {"type": "number"},
                    "yoy_growth": {"type": "string"}
                }
            },
            "schema_prompt": "Extract key financial metrics"
        },
        "Leadership": {
            "schema": {
                "type": "object",
                "properties": {
                    "ceo": {"type": "string"},
                    "board_members": {"type": "array", "items": {"type": "string"}}
                }
            },
            "schema_prompt": "Extract leadership information"
        },
        "Outlook": {
            "schema": {
                "type": "object",
                "properties": {
                    "guidance": {"type": "string"},
                    "key_initiatives": {"type": "array", "items": {"type": "string"}}
                }
            },
            "schema_prompt": "Extract forward-looking statements"
        }
    }
)

# Results are organized by topic
for topic, result in schema_response.results.items():
    print(f"\n--- {topic} ---")
    print(result)

Response Field Mapping

If you’re parsing the response, here’s how the fields map:

Old field (on /extract response)	New field (on /schema response)	Notes
`response.structured_output`	`response.schema_output.values`	Values are now nested under `schema_output.values`
`response.structured_output.citations`	`response.schema_output.citations`	Same structure, new path
`response.input_schema`	N/A	Echo removed; you know what schema you sent
`response.schema_error`	Standard HTTP error response	Errors are returned as 4xx/5xx responses

FAQ

Will structured_output on /extract stop working immediately?

No. The parameter will continue to work for backward compatibility but will be removed in a future version. We recommend migrating as soon as possible.

Is the schema format the same?

Yes. The JSON Schema format is identical. The only change is where you send it: instead of structured_output.schema on /extract, you send schema_config.input_schema on /schema. The schema_prompt field is also in the same location within the config object.

Does the two-step flow cost more?

/schema is billed at 1 credit/page (or 4 credits/page with effort: true) on top of the /extract charge. The document is not re-extracted — /schema runs on the already-extracted content — but the schema step itself is metered. See Credit Usage for the full rate table.

Can I use async for the schema step too?

Yes! Set async: true on the /schema request to get a job_id and poll for results, just like extraction.

What if I don't need structured data — just markdown?

Then you’re already using the recommended flow. Just call /extract without any schema parameters and use the markdown from the response.

Schema Endpoint

Full reference for the /schema endpoint (single + split mode)

Schema Design Guide

Best practices for writing effective JSON schemas

Pipeline Overview

How extract → split → schema work together

Split Endpoint

Split documents into topics for targeted schema extraction

​Overview

​Why the change?

​Before vs. After

​Migration Steps

​Code Examples

​Python SDK

​TypeScript SDK

​curl

​With Async Processing

​Advanced: Split-Mode Schema (New Capability)

​Response Field Mapping

​FAQ

​Related

Schema Endpoint

Schema Design Guide

Pipeline Overview

Split Endpoint

Overview

Why the change?

Before vs. After

Migration Steps

Code Examples

Python SDK

TypeScript SDK

curl

With Async Processing

Advanced: Split-Mode Schema (New Capability)

Response Field Mapping

FAQ

Related