Skip to main content
Pulse workflows are chains of small, reusable steps. Each step returns an ID that lets the next step reuse work instead of re-uploading or re-processing the same document.

The Golden Path

ID Handoffs

You haveYou can call nextWhy
job_idGET /job/{jobId}Check async status and retrieve the result.
extraction_id/schemaExtract structured data from the whole document.
extraction_id/tablesExtract table structure from the document.
extraction_id/splitAssign pages to topics.
split_id/schemaApply different schemas to topic page groups.
split_id/tablesExtract tables scoped to split topics.
schema_id/schema/{schemaId}/excelDownload a filled Excel template when schema template mode was used.

Extract -> Schema

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

extract_result = client.extract(file=open("invoice.pdf", "rb"))

schema_result = client.schema(
    extraction_id=extract_result.extraction_id,
    schema_config={
        "input_schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string"},
                "total_amount": {"type": "number"}
            }
        }
    }
)

print(schema_result.schema_output["values"])

Extract -> Tables

extract_result = client.extract(file=open("10k.pdf", "rb"))

tables_result = client.tables(
    extraction_id=extract_result.extraction_id,
    tables_config={
        "merge": True,
        "charts_to_tables": True
    }
)

for table in tables_result.tables_output["tables"]:
    print(table["table_content"])

Extract -> Split -> Schema

extract_result = client.extract(file=open("annual-report.pdf", "rb"))

split_result = client.split(
    extraction_id=extract_result.extraction_id,
    split_config={
        "split_input": [
            {"name": "Financials", "description": "Financial statements and metrics"},
            {"name": "Leadership", "description": "Executives and board members"}
        ]
    }
)

schema_result = client.schema(
    split_id=split_result.split_id,
    split_schema_config={
        "Financials": {
            "schema": {
                "type": "object",
                "properties": {
                    "revenue": {"type": "number"},
                    "net_income": {"type": "number"}
                }
            }
        },
        "Leadership": {
            "schema": {
                "type": "object",
                "properties": {
                    "ceo": {"type": "string"}
                }
            }
        }
    }
)

Async Chaining

When you set async: true, wait for completion before passing the result to the next step.
job = client.extract(
    file=open("large-file.pdf", "rb"),
    async_=True
)

while True:
    status = client.jobs.get_job(job_id=job.job_id)
    if status.status == "completed":
        extract_result = status.result
        break
    if status.status in ["failed", "canceled"]:
        raise RuntimeError(status.status)

schema_result = client.schema(
    extraction_id=extract_result["extraction_id"],
    schema_config={"input_schema": {"type": "object", "properties": {}}}
)

Common Mistakes

A job_id is for polling. After the job completes, read the completed result and pass its extraction_id into /schema, /split, or /tables.
Downstream steps need saved extraction artifacts. Keep storage enabled when you plan to chain.
If the target pages are always known, pass pages or a table page_range. Use Split when topic location changes by document.
Schema is great for named fields. Use Tables when preserving row and column relationships is the product.

Pipeline Overview

See supported API pipeline shapes.

Moving from Platform to Production

Generate chained SDK calls from the Playground.