> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Schema Extraction

> Apply schema extraction to a previously saved extraction. The mode is
inferred from the input:

**Single mode** — Provide `extraction_id` + `schema_config` (or
`schema_config_id`) to apply one schema to the entire document.

**Multi-extraction mode** — Provide a batch extract ID as `extraction_id`
(auto-detected) or an explicit `extraction_ids` list. The content from all
extractions is combined and the schema is applied to the composite. Citations
use `extraction_id-bb_id` format to disambiguate across source documents.

**Split mode** — Provide `split_id` + `split_schema_config` to apply
different schemas to different page groups from a prior `/split` call.
Each topic can have its own schema, prompt, and effort setting.

**Excel template mode** — Provide `excel_template` (base64 .xlsx) in
`schema_config` instead of `input_schema`. The schema is auto-generated
from the template's column headers, and a filled copy is returned as
`excel_output_url`.

Creates a versioned schema record that can be retrieved later.
Set `async: true` to return immediately with a job_id for polling.

To apply schemas across many extractions or splits at once, see
[Batch Schema](api:POST/batch/schema) or the
[Batch Processing guide](/batch).

## Overview

<Info>
  **Pipeline Step 2 or 3** — Schema requires a prior [extraction](/api-reference/endpoint/extract). For split mode, it also requires a prior [split](/api-reference/endpoint/split). The mode is inferred from the input fields you provide.
</Info>

Apply a schema to previously extracted documents to get structured data output. This endpoint supports multiple modes, inferred from the input:

* **Single mode** — provide `extraction_id` to apply one schema to a single document
* **Multi-extraction mode** — provide a batch extract ID as `extraction_id` (auto-detected) or an explicit `extraction_ids` list to combine content from multiple documents and apply the schema to the composite
* **Split mode** — provide `split_id` to apply per-topic schemas to page groups from a prior `/split`
* **Excel template mode** — provide `excel_template` (base64 `.xlsx`) in `schema_config` instead of `input_schema` to auto-generate the schema from column headers and receive a filled Excel file

<Note>
  This endpoint operates on **saved extractions** (created via `/extract` with storage enabled, which is the default).
</Note>

<Note>
  To apply schemas across many extractions or splits at once, use [Batch Schema](/api-reference/endpoint/batch-overview#batch-schema). It supports both single and split modes.
</Note>

### Async Mode

Set `async: true` to return immediately with a job ID for polling. See [Polling for Results](/api-reference/endpoint/poll) for details.

| Field   | Type    | Required | Description                                                                                                   |
| ------- | ------- | -------- | ------------------------------------------------------------------------------------------------------------- |
| `async` | boolean | No       | If `true`, returns immediately with a `job_id` for [polling](/api-reference/endpoint/poll). Default: `false`. |

**Async Response (200)**:

| Field     | Type   | Description                                        |
| --------- | ------ | -------------------------------------------------- |
| `job_id`  | string | Job ID for [polling](/api-reference/endpoint/poll) |
| `status`  | string | `"pending"`                                        |
| `message` | string | Human-readable description                         |

***

## Mode Reference

<Tabs>
  <Tab title="Single Mode">
    ### Request

    Apply one schema to an entire extraction.

    | Field              | Type    | Required | Description                                                                                                          |
    | ------------------ | ------- | -------- | -------------------------------------------------------------------------------------------------------------------- |
    | `extraction_id`    | uuid    | Yes      | ID of a saved extraction, or a batch extract job ID (auto-detected — see [Multi-Extraction Mode](#multi-extraction)) |
    | `extraction_ids`   | uuid\[] | No       | Explicit list of extraction IDs to combine (see [Multi-Extraction Mode](#multi-extraction))                          |
    | `schema_config`    | object  | XOR      | Inline schema (see [Schema Config](#schema-config))                                                                  |
    | `schema_config_id` | uuid    | XOR      | Reference to a saved schema configuration                                                                            |
    | `async`            | boolean | No       | Default: `false`                                                                                                     |

    #### Schema Config

    Provide **either** `input_schema` or `excel_template` — not both.

    | Field            | Type            | Required | Description                                                                                                                                                                |
    | ---------------- | --------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `input_schema`   | object          | XOR      | JSON Schema defining the structured data to extract                                                                                                                        |
    | `excel_template` | string (base64) | XOR      | Base64-encoded `.xlsx` template — column headers are used to auto-generate the JSON Schema and a filled copy is returned (see [Excel Template Mode](#excel-template-mode)) |
    | `schema_prompt`  | string          | No       | Natural language instructions to guide extraction                                                                                                                          |
    | `effort`         | boolean         | No       | Enable extended reasoning for complex documents                                                                                                                            |

    ### Response (200)

    | Field              | Type    | Description                                                                                      |
    | ------------------ | ------- | ------------------------------------------------------------------------------------------------ |
    | `schema_id`        | uuid    | Unique identifier for this schema version                                                        |
    | `version`          | integer | Schema version number                                                                            |
    | `schema_output`    | object  | `{ values: {...}, citations: {...} }`                                                            |
    | `extraction_ids`   | uuid\[] | Present when multiple extractions were combined — lists all source extraction IDs                |
    | `excel_output_url` | string  | API path to download the filled Excel template (only present when `excel_template` was provided) |

    ### Example — Inline Schema

    <CodeGroup>
      ```python Python theme={null}
      from pulse import Pulse

      client = Pulse(api_key="YOUR_API_KEY")

      schema_result = client.schema(
          extraction_id="abc123-def456-ghi789",
          schema_config={
              "input_schema": {
                  "type": "object",
                  "properties": {
                      "invoice_number": {"type": "string"},
                      "total_amount": {"type": "number"},
                      "vendor_name": {"type": "string"},
                      "line_items": {
                          "type": "array",
                          "items": {
                              "type": "object",
                              "properties": {
                                  "description": {"type": "string"},
                                  "amount": {"type": "number"}
                              }
                          }
                      }
                  },
                  "required": ["invoice_number", "total_amount"]
              },
              "schema_prompt": "Extract all invoice details including line items"
          }
      )

      print(schema_result.schema_output)
      ```

      ```typescript TypeScript theme={null}
      import { PulseClient } from "pulse-ts-sdk";

      const client = new PulseClient({ apiKey: "YOUR_API_KEY" });

      const schemaResult = await client.schema({
          extraction_id: "abc123-def456-ghi789",
          schema_config: {
              input_schema: {
                  type: "object",
                  properties: {
                      invoice_number: { type: "string" },
                      total_amount: { type: "number" },
                      vendor_name: { type: "string" },
                  },
                  required: ["invoice_number", "total_amount"],
              },
              schema_prompt: "Extract all invoice details",
          },
      });

      console.log(schemaResult.schema_output);
      ```

      ```bash curl theme={null}
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "extraction_id": "abc123-def456-ghi789",
          "schema_config": {
            "input_schema": {
              "type": "object",
              "properties": {
                "invoice_number": {"type": "string"},
                "total_amount": {"type": "number"}
              },
              "required": ["invoice_number", "total_amount"]
            },
            "schema_prompt": "Extract invoice details"
          }
        }'
      ```
    </CodeGroup>

    ### Example — Saved Config Reference

    <CodeGroup>
      ```python Python theme={null}
      schema_result = client.schema(
          extraction_id="abc123-def456-ghi789",
          schema_config_id="config-uuid-123"
      )
      ```

      ```bash curl theme={null}
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{"extraction_id": "abc123-def456-ghi789", "schema_config_id": "config-uuid-123"}'
      ```
    </CodeGroup>

    ### Example Response

    ```json theme={null}
    {
      "schema_id": "schema-uuid-456",
      "version": 2,
      "schema_output": {
        "values": {
          "invoice_number": "INV-2024-001",
          "total_amount": 1250.00,
          "vendor_name": "Acme Corp",
          "line_items": [
            {"description": "Consulting Services", "amount": 1000.00},
            {"description": "Travel Expenses", "amount": 250.00}
          ]
        },
        "citations": {
          "invoice_number": {"page": 1, "bbox": [100, 50, 200, 70]},
          "total_amount": {"page": 1, "bbox": [400, 500, 500, 520]}
        }
      }
    }
    ```

    <h3 id="multi-extraction">
      Multi-Extraction Mode
    </h3>

    Combine content from multiple documents and apply a single schema to the composite, producing **one merged result**. This is useful when the data you need spans across several files (e.g., a loss summary in one file and exposure data in another).

    <Note>
      This is different from [Batch Schema](/api-reference/endpoint/batch-overview#batch-schema), which applies the same schema to each document **independently** (one result per document). Use multi-extraction when you need to cross-reference or merge data from multiple source files into a single output.
    </Note>

    There are two ways to trigger multi-extraction:

    1. **Batch extract auto-detection** — Pass a batch extract `batch_job_id` as `extraction_id`. The system detects it as a batch parent and automatically combines all completed child extractions.
    2. **Explicit list** — Pass an `extraction_ids` array with the specific extraction IDs to combine.

    Citations in multi-extraction results use the `extraction_id-bb_id` format (e.g., `abc123-txt-1`) to disambiguate bounding boxes across source documents.

    <CodeGroup>
      ```python Python theme={null}
      from pulse import Pulse
      from pulse.types.schema_config import SchemaConfig

      client = Pulse(api_key="YOUR_API_KEY")

      # Option 1: Pass a batch extract job ID (auto-detected)
      schema_result = client.schema(
          extraction_id="<batch_job_id>",
          schema_config=SchemaConfig(
              input_schema={
                  "type": "object",
                  "properties": {
                      "policy_period": {"type": "string"},
                      "exposure": {"type": "number"},
                  },
              },
              schema_prompt="Combine data from both documents",
          ),
      )

      # Option 2: Pass an explicit list of extraction IDs
      schema_result = client.schema(
          extraction_ids=["extraction-1-uuid", "extraction-2-uuid"],
          schema_config=SchemaConfig(
              input_schema={
                  "type": "object",
                  "properties": {
                      "policy_period": {"type": "string"},
                      "exposure": {"type": "number"},
                  },
              },
          ),
      )

      # Response includes the list of source extraction IDs
      print(schema_result.extraction_ids)
      ```

      ```bash curl theme={null}
      # Option 1: Batch extract ID (auto-detected)
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "extraction_id": "<batch_job_id>",
          "schema_config": {
            "input_schema": {
              "type": "object",
              "properties": {
                "policy_period": {"type": "string"},
                "exposure": {"type": "number"}
              }
            },
            "schema_prompt": "Combine data from both documents"
          }
        }'

      # Option 2: Explicit extraction IDs
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "extraction_ids": ["extraction-1-uuid", "extraction-2-uuid"],
          "schema_config": {
            "input_schema": {
              "type": "object",
              "properties": {
                "policy_period": {"type": "string"},
                "exposure": {"type": "number"}
              }
            }
          }
        }'
      ```
    </CodeGroup>

    <h3 id="excel-template-mode">
      Excel Template Mode
    </h3>

    Instead of writing a JSON Schema by hand, provide an Excel template (`.xlsx`) with the column headers you want filled. The system auto-generates the JSON Schema from the template's structure, applies it to the extraction, and returns a filled copy of the original template.

    <CodeGroup>
      ```python Python theme={null}
      import base64
      from pulse import Pulse
      from pulse.types.schema_config import SchemaConfig

      client = Pulse(api_key="YOUR_API_KEY")

      with open("template.xlsx", "rb") as f:
          template_b64 = base64.b64encode(f.read()).decode()

      schema_result = client.schema(
          extraction_id="abc123-def456-ghi789",
          schema_config=SchemaConfig(
              excel_template=template_b64,
              schema_prompt="Extract policy period data into the template columns",
          ),
      )

      # Download the filled Excel file
      excel_bytes = client.download_schema_excel(schema_result.schema_id)
      with open("filled_output.xlsx", "wb") as f:
          for chunk in excel_bytes:
              f.write(chunk)
      ```

      ```bash curl theme={null}
      # Base64-encode the template
      TEMPLATE_B64=$(base64 -i template.xlsx)

      # Apply schema with Excel template
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d "{
          \"extraction_id\": \"abc123-def456-ghi789\",
          \"schema_config\": {
            \"excel_template\": \"$TEMPLATE_B64\",
            \"schema_prompt\": \"Extract policy period data into the template columns\"
          }
        }"

      # Download the filled Excel (requires API key auth)
      curl -o filled_output.xlsx \
        -H "x-api-key: YOUR_API_KEY" \
        https://api.runpulse.com/schema/<schema_id>/excel
      ```
    </CodeGroup>

    The response includes `excel_output_url` (e.g., `/schema/{schema_id}/excel`) — an authenticated API path for downloading the filled template. Use `client.download_schema_excel(schema_id)` in the SDK or make an authenticated `GET` request.

    <Note>
      Excel template mode and multi-extraction mode can be combined — pass a batch extract ID or `extraction_ids` along with `excel_template` to fill a template from multiple source documents.
    </Note>
  </Tab>

  <Tab title="Split Mode">
    ### Request

    Apply different schemas to different page groups from a prior `/split` call.

    | Field                 | Type    | Required | Description                                                    |
    | --------------------- | ------- | -------- | -------------------------------------------------------------- |
    | `split_id`            | uuid    | Yes      | ID from a prior [`/split`](/api-reference/endpoint/split) call |
    | `split_schema_config` | object  | Yes      | Per-topic schema configs (keys = topic names from split)       |
    | `async`               | boolean | No       | Default: `false`                                               |

    Each topic in `split_schema_config`:

    | Field              | Type    | Required | Description                        |
    | ------------------ | ------- | -------- | ---------------------------------- |
    | `schema`           | object  | XOR      | JSON Schema for this topic         |
    | `schema_prompt`    | string  | No       | Additional extraction instructions |
    | `effort`           | boolean | No       | Enable extended reasoning          |
    | `schema_config_id` | uuid    | XOR      | Reference to a saved schema config |

    ### Response (200)

    | Field           | Type   | Description                                                           |
    | --------------- | ------ | --------------------------------------------------------------------- |
    | `schema_id`     | uuid   | Unique identifier for this schema version                             |
    | `split_id`      | uuid   | ID of the split that defined the page groups                          |
    | `results`       | object | Per-topic results: `{ "topic": { values: {...}, citations: {...} } }` |
    | `input_schemas` | object | Echo of the schemas applied, keyed by topic                           |
    | `errors`        | object | Optional: per-topic errors if any topics failed                       |

    ### Example — Per-Topic Schemas

    <CodeGroup>
      ```python Python theme={null}
      from pulse import Pulse

      client = Pulse(api_key="YOUR_API_KEY")

      schema_result = client.schema(
          split_id="split-uuid-123",
          split_schema_config={
              "financial_statements": {
                  "schema": {
                      "type": "object",
                      "properties": {
                          "revenue": {"type": "number"},
                          "expenses": {"type": "number"},
                          "net_income": {"type": "number"}
                      }
                  },
                  "schema_prompt": "Extract financial data from statements"
              },
              "signatures": {
                  "schema": {
                      "type": "object",
                      "properties": {
                          "signee_name": {"type": "string"},
                          "date_signed": {"type": "string"}
                      }
                  }
              }
          }
      )

      for topic, result in schema_result.results.items():
          print(f"{topic}: {result}")
      ```

      ```typescript TypeScript theme={null}
      import { PulseClient } from "pulse-ts-sdk";

      const client = new PulseClient({ apiKey: "YOUR_API_KEY" });

      const schemaResult = await client.schema({
          split_id: "split-uuid-123",
          split_schema_config: {
              financial_statements: {
                  schema: {
                      type: "object",
                      properties: {
                          revenue: { type: "number" },
                          expenses: { type: "number" },
                      },
                  },
                  schema_prompt: "Extract financial data",
              },
              signatures: {
                  schema: {
                      type: "object",
                      properties: {
                          signee_name: { type: "string" },
                          date_signed: { type: "string" },
                      },
                  },
              },
          },
      });

      for (const [topic, result] of Object.entries(schemaResult.results)) {
          console.log(`${topic}:`, result);
      }
      ```

      ```bash curl theme={null}
      curl -X POST https://api.runpulse.com/schema \
        -H "x-api-key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "split_id": "split-uuid-123",
          "split_schema_config": {
            "financial_statements": {
              "schema": {"type": "object", "properties": {"revenue": {"type": "number"}}},
              "schema_prompt": "Extract financial data"
            },
            "signatures": {
              "schema": {"type": "object", "properties": {"signee_name": {"type": "string"}}}
            }
          }
        }'
      ```
    </CodeGroup>

    ### Example Response

    ```json theme={null}
    {
      "schema_id": "schema-uuid-789",
      "split_id": "split-uuid-123",
      "results": {
        "financial_statements": {
          "values": { "revenue": 5000000, "expenses": 3200000 },
          "citations": { "revenue": {"page": 15, "bbox": [100, 200, 300, 220]} }
        },
        "signatures": {
          "values": { "signee_name": "Jane Doe", "date_signed": "2024-01-15" },
          "citations": {}
        }
      },
      "input_schemas": {
        "financial_statements": { "type": "object", "properties": { "revenue": { "type": "number" } } },
        "signatures": { "type": "object", "properties": { "signee_name": { "type": "string" } } }
      }
    }
    ```
  </Tab>
</Tabs>

***

## Download Filled Excel — `GET /schema/{schemaId}/excel`

When a schema extraction was created with `excel_template`, the filled output can be downloaded from this authenticated endpoint. Requires the same API key used for other endpoints. The caller must belong to the org that owns the underlying extraction.

| Status | Description                                                                                   |
| ------ | --------------------------------------------------------------------------------------------- |
| 200    | Returns the filled `.xlsx` file as a binary download                                          |
| 401    | Authentication failed or missing API key                                                      |
| 404    | Schema not found, or no Excel output (was `excel_template` provided in the original request?) |

***

## Error Responses

| Status | Error              | Description                                                   |
| ------ | ------------------ | ------------------------------------------------------------- |
| 400    | Invalid request    | Must provide `extraction_id`, `extraction_ids`, or `split_id` |
| 400    | Invalid schema     | Schema must follow JSON Schema / OpenAPI 3.0 format           |
| 400    | Mutually exclusive | Cannot provide both `input_schema` and `excel_template`       |
| 401    | Unauthorized       | Invalid or missing API key                                    |
| 404    | Not found          | Extraction, batch job, or split not found                     |
| 500    | Processing error   | Schema extraction failed                                      |

***

## Best Practices

<AccordionGroup>
  <Accordion title="Use effort mode for complex documents">
    Set `effort: true` for documents with complex layouts, tables, or when initial extraction quality is low.
  </Accordion>

  <Accordion title="Provide schema_prompt for context">
    Add natural language instructions to guide the extraction, especially for ambiguous fields.
  </Accordion>

  <Accordion title="Use async for large schemas">
    If your schema has many fields or the document is large, set `async: true` to avoid timeouts. See [Polling for Results](/api-reference/endpoint/poll).
  </Accordion>

  <Accordion title="For multi-section documents, use split mode">
    First call [`/split`](/api-reference/endpoint/split) to get page groups, then use this endpoint with `split_id` + `split_schema_config`.
  </Accordion>

  <Accordion title="Combine data from multiple files with multi-extraction">
    When the data you need spans multiple documents, use [Batch Extract](/api-reference/endpoint/batch-overview#batch-extract) to extract all files, then pass the `batch_job_id` as `extraction_id` to this endpoint. The system auto-detects the batch parent and combines content from all child extractions.
  </Accordion>

  <Accordion title="Use Excel templates for spreadsheet-native workflows">
    If your output is an Excel spreadsheet, skip the JSON Schema definition and provide the empty `.xlsx` template directly via `excel_template`. The column headers define the schema, and you get a filled copy back via `excel_output_url`.
  </Accordion>
</AccordionGroup>

***

## Related Endpoints

<CardGroup cols={2}>
  <Card title="Extract" icon="file-lines" href="/api-reference/endpoint/extract">
    Extract content from a document
  </Card>

  <Card title="Split Document" icon="scissors" href="/api-reference/endpoint/split">
    Split a document into topic-based page groups
  </Card>

  <Card title="Batch Processing" icon="layer-group" href="/api-reference/endpoint/batch-overview">
    Apply schema across many documents in parallel
  </Card>
</CardGroup>


## OpenAPI

````yaml POST /schema
openapi: 3.1.0
info:
  title: Pulse API Structure
  version: 0.1.0
  description: >-
    Canonical contract for the Pulse extraction APIs. This specification is the
    single source of truth for shared request/response models that client and
    server packages consume.
servers:
  - url: https://api.runpulse.com
    description: Default Pulse API base URL
security:
  - ApiKey: []
paths:
  /schema:
    post:
      tags:
        - Schema
      summary: Extract structured data from a saved extraction or split
      description: >-
        Apply schema extraction to a previously saved extraction. The mode is

        inferred from the input:


        **Single mode** — Provide `extraction_id` + `schema_config` (or

        `schema_config_id`) to apply one schema to the entire document.


        **Multi-extraction mode** — Provide a batch extract ID as
        `extraction_id`

        (auto-detected) or an explicit `extraction_ids` list. The content from
        all

        extractions is combined and the schema is applied to the composite.
        Citations

        use `extraction_id-bb_id` format to disambiguate across source
        documents.


        **Split mode** — Provide `split_id` + `split_schema_config` to apply

        different schemas to different page groups from a prior `/split` call.

        Each topic can have its own schema, prompt, and effort setting.


        **Excel template mode** — Provide `excel_template` (base64 .xlsx) in

        `schema_config` instead of `input_schema`. The schema is auto-generated

        from the template's column headers, and a filled copy is returned as

        `excel_output_url`.


        Creates a versioned schema record that can be retrieved later.

        Set `async: true` to return immediately with a job_id for polling.


        To apply schemas across many extractions or splits at once, see

        [Batch Schema](api:POST/batch/schema) or the

        [Batch Processing guide](/batch).
      operationId: extractSchema
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SchemaInput'
      responses:
        '200':
          description: >-
            Schema extraction result (when async=false or omitted). Shape
            depends on the mode used.
          content:
            application/json:
              schema:
                oneOf:
                  - $ref: '#/components/schemas/SingleSchemaResponse'
                  - $ref: '#/components/schemas/SplitSchemaResponse'
        '202':
          description: Schema job accepted (when async=true)
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AsyncSubmissionResponse'
        '400':
          description: Invalid request parameters or schema format
        '401':
          description: Authentication failed or missing API key
        '404':
          description: Extraction or split not found
        '429':
          description: Rate limit exceeded
        '500':
          description: Internal server error
components:
  schemas:
    SchemaInput:
      type: object
      description: >-
        Request body for schema extraction. Mode is inferred from the input:

        - Provide `extraction_id` for single-mode or multi-extraction
        (auto-detected).
          If the ID belongs to a batch extract, its child extractions are combined automatically.
        - Provide `extraction_ids` for an explicit list of extractions to
        combine.

        - Provide `split_id` + `split_schema_config` for split-mode extraction.
      properties:
        extraction_id:
          type: string
          format: uuid
          description: >-
            ID of a saved extraction OR a batch extract job. When a batch
            extract ID is provided, the system auto-detects it and combines all
            completed child extractions into a single schema application.
        extraction_ids:
          type: array
          items:
            type: string
            format: uuid
          description: >-
            Explicit list of extraction IDs to combine. The markdown and
            bounding boxes from all extractions are merged and the schema is
            applied to the composite content. Citations use
            `extraction_id-bb_id` format to disambiguate across source
            documents.
        split_id:
          type: string
          format: uuid
          description: >-
            ID of saved split (from a prior `/split` call). Use for split-mode
            schema extraction.
        schema_config:
          description: >-
            Inline schema configuration for single mode. Required (with
            extraction_id) if schema_config_id is not provided.
          allOf:
            - $ref: '#/components/schemas/SchemaConfig'
        schema_config_id:
          type: string
          format: uuid
          description: >-
            Reference to a saved schema configuration for single mode. Use this
            instead of providing schema_config inline.
        split_schema_config:
          type: object
          additionalProperties:
            $ref: '#/components/schemas/TopicSchemaConfig'
          description: >-
            Per-topic schema configurations for split mode. Keys must match the
            topic names from the split. Each topic provides either inline schema
            or schema_config_id.
        async:
          type: boolean
          default: false
          description: >-
            If true, returns immediately with a job_id for polling via  GET
            /job/{jobId}. Otherwise processes synchronously.
    SingleSchemaResponse:
      type: object
      description: Response for single schema extraction mode.
      required:
        - schema_id
        - version
        - schema_output
      properties:
        schema_id:
          type: string
          format: uuid
          description: Unique identifier for this schema version.
        version:
          type: integer
          minimum: 1
          description: Version number of this schema for the extraction.
        schema_output:
          description: Extracted values and citations.
          allOf:
            - $ref: '#/components/schemas/StructuredOutputResult'
        extraction_ids:
          type: array
          items:
            type: string
            format: uuid
          description: >-
            Present when multiple extractions were combined (via batch extract
            auto-detection or explicit `extraction_ids` input). Lists all source
            extraction IDs that contributed to the result.
        excel_output_url:
          type: string
          description: >-
            API path to download the filled Excel template (e.g.
            `/schema/{schema_id}/excel`). Requires the same API key
            authentication. Only present when `excel_template` was provided in
            the request.
        credits_used:
          type: number
          format: float
          nullable: true
          description: >-
            Number of credits consumed by this request. Only present when the
            organization has the credit billing system enabled.
        plan_info:
          allOf:
            - $ref: '#/components/schemas/PlanInfo'
          description: >-
            Billing tier and cumulative usage information for the calling org,
            including this schema run.
    SplitSchemaResponse:
      type: object
      description: Response for split schema extraction mode.
      required:
        - schema_id
        - split_id
        - results
      properties:
        schema_id:
          type: string
          format: uuid
          description: Unique identifier for this schema version.
        split_id:
          type: string
          format: uuid
          description: ID of the split that defined the page groups.
        results:
          type: object
          description: >-
            Per-topic extraction results. Keys match the topic names from the
            split. Each value contains `values` and `citations`.
          additionalProperties:
            $ref: '#/components/schemas/StructuredOutputResult'
        input_schemas:
          type: object
          description: Echo of the schemas that were applied, keyed by topic.
          additionalProperties:
            type: object
        errors:
          type: object
          description: >-
            Per-topic errors if any topics failed to process. Keys are topic
            names, values are error messages.
          additionalProperties:
            type: string
        credits_used:
          type: number
          format: float
          nullable: true
          description: >-
            Number of credits consumed by this request. Only present when the
            organization has the credit billing system enabled.
        plan_info:
          allOf:
            - $ref: '#/components/schemas/PlanInfo'
          description: >-
            Billing tier and cumulative usage information for the calling org,
            including this split-schema run.
    AsyncSubmissionResponse:
      type: object
      description: >-
        Acknowledgement returned when a request is submitted for asynchronous
        processing. Poll `GET /job/{job_id}` to check status and retrieve
        results.
      required:
        - job_id
        - status
      properties:
        job_id:
          type: string
          description: Identifier assigned to the asynchronous job.
        status:
          type: string
          description: Initial status reported by the server.
          enum:
            - pending
            - processing
            - completed
            - failed
            - canceled
        message:
          type: string
          description: Human-readable description of the accepted job.
        queuedAt:
          type: string
          format: date-time
          deprecated: true
          description: >-
            **Deprecated** — Timestamp indicating when the job was accepted.
            Retained for backward compatibility. Use `GET /job/{jobId}` for
            timing details.
        credits_used:
          type: number
          format: float
          nullable: true
          description: >-
            Number of credits consumed by this request. Only present when the
            organization has the credit billing system enabled.
    SchemaConfig:
      type: object
      description: >-
        Inline schema configuration. Provide `input_schema` (JSON Schema) OR
        `excel_template` (base64 .xlsx) — not both. When `excel_template` is
        provided, the JSON Schema is auto-generated from the spreadsheet's
        column headers.
      properties:
        input_schema:
          type: object
          description: >-
            JSON Schema defining the structured data to extract. Required unless
            `excel_template` is provided.
        excel_template:
          type: string
          format: byte
          description: >-
            Base64-encoded Excel template (.xlsx). When provided, the template's
            column headers are used to auto-generate the JSON Schema and a
            filled copy of the template is returned in the response as
            `excel_output_url`. Mutually exclusive with `input_schema`.
        schema_prompt:
          type: string
          description: Natural language prompt with additional extraction instructions.
        effort:
          type: boolean
          default: false
          description: Enable extended reasoning for complex extractions.
    TopicSchemaConfig:
      type: object
      description: |-
        Per-topic schema configuration.
        Provide EITHER inline schema fields OR schema_config_id reference.
      properties:
        schema:
          type: object
          description: JSON Schema for this topic.
        schema_prompt:
          type: string
          description: Additional instructions for this topic.
        effort:
          type: boolean
          default: false
          description: Enable extended reasoning.
        schema_config_id:
          type: string
          format: uuid
          description: Reference to a saved schema configuration for this topic.
    StructuredOutputResult:
      type: object
      description: Result of schema extraction with values and citations.
      properties:
        values:
          type: object
          description: Extracted values matching the provided schema.
          additionalProperties: true
        citations:
          type: object
          description: Citation references linking extracted values to source locations.
          additionalProperties: true
    PlanInfo:
      type: object
      description: >-
        Cumulative billing snapshot for the calling organization. Sourced from
        the `pulse-org-stats` aggregate table maintained asynchronously by the
        org-stats Lambda; the in-flight request's contribution is added on top
        so every response reflects post-request state. Returned by every
        endpoint that consumes credits (extract, schema, tables, split, form,
        and their batch / pipeline equivalents).
      properties:
        tier:
          type: string
          description: Billing tier, e.g. `"trial"`, `"growth"`, `"pulse_ultra_2"`.
        total_credits_used:
          type: number
          format: float
          description: >-
            Total credits consumed by the organization to date, including this
            request. The primary billing metric going forward.
        pages_used:
          type: integer
          minimum: 0
          description: >-
            Total pages processed by the organization to date, including this
            request. Kept for backward compatibility with clients that haven't
            migrated to `total_credits_used`.
        note:
          type: string
          description: >-
            Optional human-readable note about billing state for this response
            (e.g. trial credits remaining). Omitted when no note applies.
  securitySchemes:
    ApiKey:
      type: apiKey
      in: header
      name: x-api-key
      x-fern-header:
        name: apiKey
        env: PULSE_API_KEY

````