> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Schema Extraction

> Apply schema extraction to a previously saved extraction. The mode is
inferred from the input:

**Single mode** — Provide `extraction_id` + `schema_config` (or
`schema_config_id`) to apply one schema to the entire document.

**Multi-extraction mode** — Provide a batch extract ID as `extraction_id`
(auto-detected) or an explicit `extraction_ids` list. The content from all
extractions is combined and the schema is applied to the composite. Citations
use `extraction_id-bb_id` format to disambiguate across source documents.

**Split mode** — Provide `split_id` + `split_schema_config` to apply
different schemas to different page groups from a prior `/split` call.

**Excel template mode** — Provide `excel_template` (base64 .xlsx) in
`schema_config` instead of `input_schema`. The schema is auto-generated
from the template's column headers, and a filled copy is returned as
`excel_output_url`.

Set `async: true` to return 202 with a job_id for polling.


## Overview

<Info>
  **Pipeline Step 2 or 3** — Schema requires a prior [extraction](/api-reference/endpoint/extract). For split mode, it also requires a prior [split](/api-reference/endpoint/split). The mode is inferred from the input fields you provide.
</Info>

Apply a schema to previously extracted documents to get structured data output. This endpoint supports multiple modes, inferred from the input:

* **Single mode** — provide `extraction_id` to apply one schema to a single document
* **Multi-extraction mode** — provide a batch extract ID as `extraction_id` (auto-detected) or an explicit `extraction_ids` list to combine content from multiple documents and apply the schema to the composite
* **Split mode** — provide `split_id` to apply per-topic schemas to page groups from a prior `/split`
* **Excel template mode** — provide `excel_template` (base64 `.xlsx`) in `schema_config` instead of `input_schema` to auto-generate the schema from column headers and receive a filled Excel file

<Note>
  This endpoint operates on **saved extractions** (created via `/extract` with storage enabled, which is the default).
</Note>

<Note>
  To apply schemas across many extractions or splits at once, use [Batch Schema](/api-reference/endpoint/batch-overview#batch-schema). It supports both single and split modes.
</Note>

### Async Mode

Set `async: true` to return immediately with a job ID for polling. See [Polling for Results](/api-reference/endpoint/poll) for details.

| Field   | Type    | Required | Description                                                                                                   |
| ------- | ------- | -------- | ------------------------------------------------------------------------------------------------------------- |
| `async` | boolean | No       | If `true`, returns immediately with a `job_id` for [polling](/api-reference/endpoint/poll). Default: `false`. |

**Async Response (200)**:

| Field     | Type   | Description                                        |
| --------- | ------ | -------------------------------------------------- |
| `job_id`  | string | Job ID for [polling](/api-reference/endpoint/poll) |
| `status`  | string | `"pending"`                                        |
| `message` | string | Human-readable description                         |

***

## Mode Reference

<Tabs>
  <Tab title="Single Mode">
    ### Request

    Apply one schema to an entire extraction.

    | Field              | Type    | Required | Description                                                                                                          |
    | ------------------ | ------- | -------- | -------------------------------------------------------------------------------------------------------------------- |
    | `extraction_id`    | uuid    | Yes      | ID of a saved extraction, or a batch extract job ID (auto-detected — see [Multi-Extraction Mode](#multi-extraction)) |
    | `extraction_ids`   | uuid\[] | No       | Explicit list of extraction IDs to combine (see [Multi-Extraction Mode](#multi-extraction))                          |
    | `schema_config`    | object  | XOR      | Inline schema (see [Schema Config](#schema-config))                                                                  |
    | `schema_config_id` | uuid    | XOR      | Reference to a saved schema configuration                                                                            |
    | `async`            | boolean | No       | Default: `false`                                                                                                     |

    #### Schema Config

    Provide **either** `input_schema` or `excel_template` — not both.

    | Field            | Type            | Required | Description                                                                                                                                                                |
    | ---------------- | --------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `input_schema`   | object          | XOR      | JSON Schema defining the structured data to extract                                                                                                                        |
    | `excel_template` | string (base64) | XOR      | Base64-encoded `.xlsx` template — column headers are used to auto-generate the JSON Schema and a filled copy is returned (see [Excel Template Mode](#excel-template-mode)) |
    | `schema_prompt`  | string          | No       | Natural language instructions to guide extraction                                                                                                                          |
    | `effort`         | boolean         | No       | Enable extended reasoning for complex documents                                                                                                                            |

    ### Response (200)

    | Field              | Type    | Description                                                                                      |
    | ------------------ | ------- | ------------------------------------------------------------------------------------------------ |
    | `schema_id`        | uuid    | Unique identifier for this schema version                                                        |
    | `version`          | integer | Schema version number                                                                            |
    | `schema_output`    | object  | `{ values: {...}, citations: {...} }`                                                            |
    | `extraction_ids`   | uuid\[] | Present when multiple extractions were combined — lists all source extraction IDs                |
    | `excel_output_url` | string  | API path to download the filled Excel template (only present when `excel_template` was provided) |

    ### Example — Inline Schema

    <CodeGroup>
      ```python Python theme={null}
      from pulse import Pulse

      client = Pulse(api_key="YOUR_API_KEY")

      schema_result = client.schema(
          extraction_id="abc123-def456-ghi789",
          schema_config={
              "input_schema": {
                  "type": "object",
                  "properties": {
                      "invoice_number": {"type": "string"},
                      "total_amount": {"type": "number"},
                      "vendor_name": {"type": "string"},
                      "line_items": {
                          "type": "array",
                          "items": {
                              "type": "object",
                              "properties": {
                                  "description": {"type": "string"},
                                  "amount": {"type": "number"}
                              }
                          }
                      }
                  },
                  "required": ["invoice_number", "total_amount"]
              },
              "schema_prompt": "Extract all invoice details including line items"
          }
      )

      print(schema_result.schema_output)
      ```

      ```typescript TypeScript theme={null}
      import { PulseClient } from "pulse-ts-sdk";

      const client = new PulseClient({ apiKey: "YOUR_API_KEY" });

      const schemaResult = await client.schema({
          extraction_id: "abc123-def456-ghi789",
          schema_config: {
              input_schema: {
                  type: "object",
                  properties: {
                      invoice_number: { type: "string" },
                      total_amount: { type: "number" },
                      vendor_name: { type: "string" },
                  },
                  required: ["invoice_number", "total_amount"],
              },
              schema_prompt: "Extract all invoice details",
          },
      });

      console.log(schemaResult.schema_output);
      ```

      ```bash curl theme={null}
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "extraction_id": "abc123-def456-ghi789",
          "schema_config": {
            "input_schema": {
              "type": "object",
              "properties": {
                "invoice_number": {"type": "string"},
                "total_amount": {"type": "number"}
              },
              "required": ["invoice_number", "total_amount"]
            },
            "schema_prompt": "Extract invoice details"
          }
        }'
      ```
    </CodeGroup>

    ### Example — Saved Config Reference

    <CodeGroup>
      ```python Python theme={null}
      schema_result = client.schema(
          extraction_id="abc123-def456-ghi789",
          schema_config_id="config-uuid-123"
      )
      ```

      ```bash curl theme={null}
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{"extraction_id": "abc123-def456-ghi789", "schema_config_id": "config-uuid-123"}'
      ```
    </CodeGroup>

    ### Example Response

    ```json theme={null}
    {
      "schema_id": "schema-uuid-456",
      "version": 2,
      "schema_output": {
        "values": {
          "invoice_number": "INV-2024-001",
          "total_amount": 1250.00,
          "vendor_name": "Acme Corp",
          "line_items": [
            {"description": "Consulting Services", "amount": 1000.00},
            {"description": "Travel Expenses", "amount": 250.00}
          ]
        },
        "citations": {
          "invoice_number": {"page": 1, "bbox": [100, 50, 200, 70]},
          "total_amount": {"page": 1, "bbox": [400, 500, 500, 520]}
        }
      }
    }
    ```

    <h3 id="multi-extraction">
      Multi-Extraction Mode
    </h3>

    Combine content from multiple documents and apply a single schema to the composite, producing **one merged result**. This is useful when the data you need spans across several files (e.g., a loss summary in one file and exposure data in another).

    <Note>
      This is different from [Batch Schema](/api-reference/endpoint/batch-overview#batch-schema), which applies the same schema to each document **independently** (one result per document). Use multi-extraction when you need to cross-reference or merge data from multiple source files into a single output.
    </Note>

    There are two ways to trigger multi-extraction:

    1. **Batch extract auto-detection** — Pass a batch extract `batch_job_id` as `extraction_id`. The system detects it as a batch parent and automatically combines all completed child extractions.
    2. **Explicit list** — Pass an `extraction_ids` array with the specific extraction IDs to combine.

    Citations in multi-extraction results use the `extraction_id-bb_id` format (e.g., `abc123-txt-1`) to disambiguate bounding boxes across source documents.

    <CodeGroup>
      ```python Python theme={null}
      from pulse import Pulse
      from pulse.types.schema_config import SchemaConfig

      client = Pulse(api_key="YOUR_API_KEY")

      # Option 1: Pass a batch extract job ID (auto-detected)
      schema_result = client.schema(
          extraction_id="<batch_job_id>",
          schema_config=SchemaConfig(
              input_schema={
                  "type": "object",
                  "properties": {
                      "policy_period": {"type": "string"},
                      "exposure": {"type": "number"},
                  },
              },
              schema_prompt="Combine data from both documents",
          ),
      )

      # Option 2: Pass an explicit list of extraction IDs
      schema_result = client.schema(
          extraction_ids=["extraction-1-uuid", "extraction-2-uuid"],
          schema_config=SchemaConfig(
              input_schema={
                  "type": "object",
                  "properties": {
                      "policy_period": {"type": "string"},
                      "exposure": {"type": "number"},
                  },
              },
          ),
      )

      # Response includes the list of source extraction IDs
      print(schema_result.extraction_ids)
      ```

      ```bash curl theme={null}
      # Option 1: Batch extract ID (auto-detected)
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "extraction_id": "<batch_job_id>",
          "schema_config": {
            "input_schema": {
              "type": "object",
              "properties": {
                "policy_period": {"type": "string"},
                "exposure": {"type": "number"}
              }
            },
            "schema_prompt": "Combine data from both documents"
          }
        }'

      # Option 2: Explicit extraction IDs
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "extraction_ids": ["extraction-1-uuid", "extraction-2-uuid"],
          "schema_config": {
            "input_schema": {
              "type": "object",
              "properties": {
                "policy_period": {"type": "string"},
                "exposure": {"type": "number"}
              }
            }
          }
        }'
      ```
    </CodeGroup>

    <h3 id="excel-template-mode">
      Excel Template Mode
    </h3>

    Instead of writing a JSON Schema by hand, provide an Excel template (`.xlsx`) with the column headers you want filled. The system auto-generates the JSON Schema from the template's structure, applies it to the extraction, and returns a filled copy of the original template.

    <CodeGroup>
      ```python Python theme={null}
      import base64
      from pulse import Pulse
      from pulse.types.schema_config import SchemaConfig

      client = Pulse(api_key="YOUR_API_KEY")

      with open("template.xlsx", "rb") as f:
          template_b64 = base64.b64encode(f.read()).decode()

      schema_result = client.schema(
          extraction_id="abc123-def456-ghi789",
          schema_config=SchemaConfig(
              excel_template=template_b64,
              schema_prompt="Extract policy period data into the template columns",
          ),
      )

      # Download the filled Excel file
      excel_bytes = client.download_schema_excel(schema_result.schema_id)
      with open("filled_output.xlsx", "wb") as f:
          for chunk in excel_bytes:
              f.write(chunk)
      ```

      ```bash curl theme={null}
      # Base64-encode the template
      TEMPLATE_B64=$(base64 -i template.xlsx)

      # Apply schema with Excel template
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d "{
          \"extraction_id\": \"abc123-def456-ghi789\",
          \"schema_config\": {
            \"excel_template\": \"$TEMPLATE_B64\",
            \"schema_prompt\": \"Extract policy period data into the template columns\"
          }
        }"

      # Download the filled Excel (requires API key auth)
      curl -o filled_output.xlsx \
        -H "x-api-key: YOUR_API_KEY" \
        https://api.runpulse.com/schema/<schema_id>/excel
      ```
    </CodeGroup>

    The response includes `excel_output_url` (e.g., `/schema/{schema_id}/excel`) — an authenticated API path for downloading the filled template. Use `client.download_schema_excel(schema_id)` in the SDK or make an authenticated `GET` request.

    <Note>
      Excel template mode and multi-extraction mode can be combined — pass a batch extract ID or `extraction_ids` along with `excel_template` to fill a template from multiple source documents.
    </Note>
  </Tab>

  <Tab title="Split Mode">
    ### Request

    Apply different schemas to different page groups from a prior `/split` call.

    | Field                 | Type    | Required | Description                                                    |
    | --------------------- | ------- | -------- | -------------------------------------------------------------- |
    | `split_id`            | uuid    | Yes      | ID from a prior [`/split`](/api-reference/endpoint/split) call |
    | `split_schema_config` | object  | Yes      | Per-topic schema configs (keys = topic names from split)       |
    | `async`               | boolean | No       | Default: `false`                                               |

    Each topic in `split_schema_config`:

    | Field              | Type    | Required | Description                        |
    | ------------------ | ------- | -------- | ---------------------------------- |
    | `schema`           | object  | XOR      | JSON Schema for this topic         |
    | `schema_prompt`    | string  | No       | Additional extraction instructions |
    | `effort`           | boolean | No       | Enable extended reasoning          |
    | `schema_config_id` | uuid    | XOR      | Reference to a saved schema config |

    ### Response (200)

    | Field           | Type   | Description                                                           |
    | --------------- | ------ | --------------------------------------------------------------------- |
    | `schema_id`     | uuid   | Unique identifier for this schema version                             |
    | `split_id`      | uuid   | ID of the split that defined the page groups                          |
    | `results`       | object | Per-topic results: `{ "topic": { values: {...}, citations: {...} } }` |
    | `input_schemas` | object | Echo of the schemas applied, keyed by topic                           |
    | `errors`        | object | Optional: per-topic errors if any topics failed                       |

    ### Example — Per-Topic Schemas

    <CodeGroup>
      ```python Python theme={null}
      from pulse import Pulse

      client = Pulse(api_key="YOUR_API_KEY")

      schema_result = client.schema(
          split_id="split-uuid-123",
          split_schema_config={
              "financial_statements": {
                  "schema": {
                      "type": "object",
                      "properties": {
                          "revenue": {"type": "number"},
                          "expenses": {"type": "number"},
                          "net_income": {"type": "number"}
                      }
                  },
                  "schema_prompt": "Extract financial data from statements"
              },
              "signatures": {
                  "schema": {
                      "type": "object",
                      "properties": {
                          "signee_name": {"type": "string"},
                          "date_signed": {"type": "string"}
                      }
                  }
              }
          }
      )

      for topic, result in schema_result.results.items():
          print(f"{topic}: {result}")
      ```

      ```typescript TypeScript theme={null}
      import { PulseClient } from "pulse-ts-sdk";

      const client = new PulseClient({ apiKey: "YOUR_API_KEY" });

      const schemaResult = await client.schema({
          split_id: "split-uuid-123",
          split_schema_config: {
              financial_statements: {
                  schema: {
                      type: "object",
                      properties: {
                          revenue: { type: "number" },
                          expenses: { type: "number" },
                      },
                  },
                  schema_prompt: "Extract financial data",
              },
              signatures: {
                  schema: {
                      type: "object",
                      properties: {
                          signee_name: { type: "string" },
                          date_signed: { type: "string" },
                      },
                  },
              },
          },
      });

      for (const [topic, result] of Object.entries(schemaResult.results)) {
          console.log(`${topic}:`, result);
      }
      ```

      ```bash curl theme={null}
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "split_id": "split-uuid-123",
          "split_schema_config": {
            "financial_statements": {
              "schema": {"type": "object", "properties": {"revenue": {"type": "number"}}},
              "schema_prompt": "Extract financial data"
            },
            "signatures": {
              "schema": {"type": "object", "properties": {"signee_name": {"type": "string"}}}
            }
          }
        }'
      ```
    </CodeGroup>

    ### Example Response

    ```json theme={null}
    {
      "schema_id": "schema-uuid-789",
      "split_id": "split-uuid-123",
      "results": {
        "financial_statements": {
          "values": { "revenue": 5000000, "expenses": 3200000 },
          "citations": { "revenue": {"page": 15, "bbox": [100, 200, 300, 220]} }
        },
        "signatures": {
          "values": { "signee_name": "Jane Doe", "date_signed": "2024-01-15" },
          "citations": {}
        }
      },
      "input_schemas": {
        "financial_statements": { "type": "object", "properties": { "revenue": { "type": "number" } } },
        "signatures": { "type": "object", "properties": { "signee_name": { "type": "string" } } }
      }
    }
    ```
  </Tab>
</Tabs>

***

## Download Filled Excel — `GET /schema/{schemaId}/excel`

When a schema extraction was created with `excel_template`, the filled output can be downloaded from this authenticated endpoint. Requires the same API key used for other endpoints. The caller must belong to the org that owns the underlying extraction.

| Status | Description                                                                                   |
| ------ | --------------------------------------------------------------------------------------------- |
| 200    | Returns the filled `.xlsx` file as a binary download                                          |
| 401    | Authentication failed or missing API key                                                      |
| 404    | Schema not found, or no Excel output (was `excel_template` provided in the original request?) |

***

## Error Responses

| Status | Error              | Description                                                   |
| ------ | ------------------ | ------------------------------------------------------------- |
| 400    | Invalid request    | Must provide `extraction_id`, `extraction_ids`, or `split_id` |
| 400    | Invalid schema     | Schema must follow JSON Schema / OpenAPI 3.0 format           |
| 400    | Mutually exclusive | Cannot provide both `input_schema` and `excel_template`       |
| 401    | Unauthorized       | Invalid or missing API key                                    |
| 404    | Not found          | Extraction, batch job, or split not found                     |
| 500    | Processing error   | Schema extraction failed                                      |

***

## Best Practices

<AccordionGroup>
  <Accordion title="Use effort mode for complex documents">
    Set `effort: true` for documents with complex layouts, tables, or when initial extraction quality is low.
  </Accordion>

  <Accordion title="Provide schema_prompt for context">
    Add natural language instructions to guide the extraction, especially for ambiguous fields.
  </Accordion>

  <Accordion title="Use async for large schemas">
    If your schema has many fields or the document is large, set `async: true` to avoid timeouts. See [Polling for Results](/api-reference/endpoint/poll).
  </Accordion>

  <Accordion title="For multi-section documents, use split mode">
    First call [`/split`](/api-reference/endpoint/split) to get page groups, then use this endpoint with `split_id` + `split_schema_config`.
  </Accordion>

  <Accordion title="Combine data from multiple files with multi-extraction">
    When the data you need spans multiple documents, use [Batch Extract](/api-reference/endpoint/batch-overview#batch-extract) to extract all files, then pass the `batch_job_id` as `extraction_id` to this endpoint. The system auto-detects the batch parent and combines content from all child extractions.
  </Accordion>

  <Accordion title="Use Excel templates for spreadsheet-native workflows">
    If your output is an Excel spreadsheet, skip the JSON Schema definition and provide the empty `.xlsx` template directly via `excel_template`. The column headers define the schema, and you get a filled copy back via `excel_output_url`.
  </Accordion>
</AccordionGroup>

***

## Related Endpoints

<CardGroup cols={2}>
  <Card title="Extract" icon="file-lines" href="/api-reference/endpoint/extract">
    Extract content from a document
  </Card>

  <Card title="Split Document" icon="scissors" href="/api-reference/endpoint/split">
    Split a document into topic-based page groups
  </Card>

  <Card title="Batch Processing" icon="layer-group" href="/api-reference/endpoint/batch-overview">
    Apply schema across many documents in parallel
  </Card>
</CardGroup>


## OpenAPI

````yaml POST /schema
openapi: 3.1.0
info:
  title: Pulse API
  description: >-
    Production-grade document extraction service that transforms complex
    documents  into structured, AI-ready data. This specification is the single
    source of truth  for the Pulse extraction APIs.
  version: 1.0.0
  contact:
    name: Pulse Support
    email: support@trypulse.ai
    url: https://docs.runpulse.com
servers:
  - url: https://api.runpulse.com
    description: Production server
security:
  - ApiKeyAuth: []
paths:
  /schema:
    post:
      tags:
        - Schema
      summary: Extract Structured Data
      description: >
        Apply schema extraction to a previously saved extraction. The mode is

        inferred from the input:


        **Single mode** — Provide `extraction_id` + `schema_config` (or

        `schema_config_id`) to apply one schema to the entire document.


        **Multi-extraction mode** — Provide a batch extract ID as
        `extraction_id`

        (auto-detected) or an explicit `extraction_ids` list. The content from
        all

        extractions is combined and the schema is applied to the composite.
        Citations

        use `extraction_id-bb_id` format to disambiguate across source
        documents.


        **Split mode** — Provide `split_id` + `split_schema_config` to apply

        different schemas to different page groups from a prior `/split` call.


        **Excel template mode** — Provide `excel_template` (base64 .xlsx) in

        `schema_config` instead of `input_schema`. The schema is auto-generated

        from the template's column headers, and a filled copy is returned as

        `excel_output_url`.


        Set `async: true` to return 202 with a job_id for polling.
      operationId: extractSchema
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SchemaInput'
      responses:
        '200':
          description: |
            Schema extraction result (when async=false or omitted).
            Shape depends on the mode used (single vs split).
          content:
            application/json:
              schema:
                oneOf:
                  - $ref: '#/components/schemas/SingleSchemaResponse'
                  - $ref: '#/components/schemas/SplitSchemaResponse'
        '202':
          description: Schema job accepted (when async=true)
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AsyncSubmissionResponse'
        '400':
          $ref: '#/components/responses/BadRequest'
        '401':
          $ref: '#/components/responses/Unauthorized'
        '404':
          $ref: '#/components/responses/NotFound'
        '429':
          $ref: '#/components/responses/TooManyRequests'
        '500':
          $ref: '#/components/responses/InternalServerError'
      x-codeSamples:
        - lang: python
          label: Python SDK
          source: |
            from pulse import Pulse

            client = Pulse(api_key="YOUR_API_KEY")

            # Single mode — apply schema to entire document
            response = client.schema(
                extraction_id="your-extraction-id",
                schema_config={
                    "input_schema": {
                        "type": "object",
                        "properties": {
                            "total": {"type": "number"},
                            "vendor": {"type": "string"}
                        }
                    },
                    "schema_prompt": "Extract invoice total and vendor"
                }
            )
            print(response.schema_output)

            # Split mode — different schemas per topic
            response = client.schema(
                split_id="your-split-id",
                split_schema_config={
                    "Introduction": {
                        "schema": {"type": "object", "properties": {"summary": {"type": "string"}}},
                        "schema_prompt": "Summarize the introduction"
                    },
                    "Financials": {
                        "schema": {"type": "object", "properties": {"revenue": {"type": "number"}}},
                        "schema_prompt": "Extract financial figures"
                    }
                }
            )
            print(response.schema_output)
        - lang: typescript
          label: TypeScript SDK
          source: |
            import { PulseClient } from "pulse-ts-sdk";

            const client = new PulseClient({
                apiKey: "YOUR_API_KEY"
            });

            // Single mode — apply schema to entire document
            const response = await client.schema({
                extraction_id: "your-extraction-id",
                schema_config: {
                    input_schema: {
                        type: "object",
                        properties: {
                            total: { type: "number" },
                            vendor: { type: "string" }
                        }
                    },
                    schema_prompt: "Extract invoice total and vendor"
                }
            });
            console.log(response.schema_output);

            // Split mode — different schemas per topic
            const splitResp = await client.schema({
                split_id: "your-split-id",
                split_schema_config: {
                    Introduction: {
                        schema: { type: "object", properties: { summary: { type: "string" } } },
                        schema_prompt: "Summarize the introduction"
                    },
                    Financials: {
                        schema: { type: "object", properties: { revenue: { type: "number" } } },
                        schema_prompt: "Extract financial figures"
                    }
                }
            });
            console.log(splitResp.schema_output);
        - lang: bash
          label: curl
          source: |
            # Single mode
            curl -X POST https://api.runpulse.com/schema \
              -H "x-api-key: YOUR_API_KEY" \
              -H "Content-Type: application/json" \
              -d '{
                "extraction_id": "your-extraction-id",
                "schema_config": {
                  "input_schema": {
                    "type": "object",
                    "properties": {
                      "total": {"type": "number"},
                      "vendor": {"type": "string"}
                    }
                  },
                  "schema_prompt": "Extract invoice total and vendor"
                }
              }'

            # Split mode
            curl -X POST https://api.runpulse.com/schema \
              -H "x-api-key: YOUR_API_KEY" \
              -H "Content-Type: application/json" \
              -d '{
                "split_id": "your-split-id",
                "split_schema_config": {
                  "Introduction": {
                    "schema": {"type": "object", "properties": {"summary": {"type": "string"}}},
                    "schema_prompt": "Summarize the introduction"
                  }
                }
              }'
components:
  schemas:
    SchemaInput:
      type: object
      description: >
        Request body for schema extraction. Mode is inferred from the input:

        - Provide `extraction_id` for single-mode or multi-extraction
        (auto-detected).
          If the ID belongs to a batch extract, its child extractions are combined automatically.
        - Provide `extraction_ids` for an explicit list of extractions to
        combine.

        - Provide `split_id` + `split_schema_config` for split-mode extraction.
      properties:
        extraction_id:
          type: string
          format: uuid
          description: >-
            ID of a saved extraction OR a batch extract job. When a batch
            extract ID is provided, the system auto-detects it and combines all
            completed child extractions into a single schema application.
        extraction_ids:
          type: array
          items:
            type: string
            format: uuid
          description: >-
            Explicit list of extraction IDs to combine. The markdown and
            bounding boxes from all extractions are merged and the schema is
            applied to the composite content. Citations use
            `extraction_id-bb_id` format to disambiguate across source
            documents.
        split_id:
          type: string
          format: uuid
          description: ID of saved split (for split mode).
        schema_config:
          type: object
          description: >-
            Inline schema configuration for single mode. Provide `input_schema`
            (JSON Schema) OR `excel_template` (base64 .xlsx) — not both.
          properties:
            input_schema:
              type: object
              description: >-
                JSON Schema defining the structured data to extract. Required
                unless `excel_template` is provided.
            excel_template:
              type: string
              format: byte
              description: >-
                Base64-encoded Excel template (.xlsx). When provided, the
                template's column headers are used to auto-generate the JSON
                Schema and a filled copy of the template is returned in the
                response as `excel_output_url`. Mutually exclusive with
                `input_schema`.
            schema_prompt:
              type: string
              description: Natural language prompt with extraction instructions.
            effort:
              type: boolean
              default: false
              description: Enable extended reasoning.
        schema_config_id:
          type: string
          format: uuid
          description: Reference to a saved schema configuration (for single mode).
        split_schema_config:
          type: object
          description: >-
            Per-topic schema configurations for split mode. Keys must match
            topic names from the split.
          additionalProperties:
            type: object
            properties:
              schema:
                type: object
                description: JSON Schema for this topic.
              schema_prompt:
                type: string
                description: Additional instructions for this topic.
              effort:
                type: boolean
                default: false
              schema_config_id:
                type: string
                format: uuid
                description: Reference to a saved schema config for this topic.
        async:
          type: boolean
          default: false
          description: If true, returns 202 with a job_id for polling via GET /job/{jobId}.
    SingleSchemaResponse:
      type: object
      description: Response for single schema extraction mode.
      required:
        - schema_id
        - version
        - schema_output
      properties:
        schema_id:
          type: string
          format: uuid
          description: Unique identifier for this schema version.
        version:
          type: integer
          minimum: 1
          description: Version number of this schema for the extraction.
        schema_output:
          type: object
          description: Extracted values and citations.
          properties:
            values:
              type: object
              additionalProperties: true
            citations:
              type: object
              additionalProperties: true
        extraction_ids:
          type: array
          items:
            type: string
            format: uuid
          description: >-
            Present when multiple extractions were combined (via batch extract
            auto-detection or explicit `extraction_ids` input). Lists all source
            extraction IDs that contributed to the result.
        excel_output_url:
          type: string
          description: >-
            API path to download the filled Excel template (e.g.
            `/schema/{schema_id}/excel`). Requires the same API key
            authentication. Only present when `excel_template` was provided in
            the request.
    SplitSchemaResponse:
      type: object
      description: Response for split schema extraction mode.
      required:
        - schema_id
        - split_id
        - results
      properties:
        schema_id:
          type: string
          format: uuid
          description: Unique identifier for this schema version.
        split_id:
          type: string
          format: uuid
          description: ID of the split that defined the page groups.
        results:
          type: object
          description: Per-topic extraction results. Keys match topic names.
          additionalProperties:
            type: object
            properties:
              values:
                type: object
                additionalProperties: true
              citations:
                type: object
                additionalProperties: true
        input_schemas:
          type: object
          description: Echo of the schemas applied, keyed by topic.
          additionalProperties:
            type: object
        errors:
          type: object
          description: Per-topic errors if any topics failed.
          additionalProperties:
            type: string
    AsyncSubmissionResponse:
      type: object
      description: >-
        Acknowledgement returned when a request is submitted for asynchronous
        processing. Poll GET /job/{job_id} to check status and retrieve results.
      required:
        - job_id
        - status
      properties:
        job_id:
          type: string
          description: Identifier assigned to the asynchronous job.
        status:
          type: string
          description: Initial status reported by the server.
          enum:
            - pending
            - processing
        message:
          type: string
          description: Human-readable description of the accepted job.
    ErrorResponse:
      type: object
      properties:
        error:
          type: object
          properties:
            code:
              type: string
              description: Error code (e.g., FILE_001, AUTH_002)
            message:
              type: string
              description: Human-readable error message
            details:
              type: object
              description: Additional error context
  responses:
    BadRequest:
      description: Bad request - Invalid parameters
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    Unauthorized:
      description: Unauthorized - Invalid or missing API key
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    NotFound:
      description: Resource not found
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    TooManyRequests:
      description: Rate limit exceeded
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    InternalServerError:
      description: Internal server error
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key
      description: API key for authentication

````