> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract File Async (Deprecated)

> **Deprecated**: Use `/extract` with `async: true` instead.

Starts an asynchronous extraction job. The request mirrors the
synchronous options but returns immediately with a job identifier that
clients can poll for completion status.

<Warning>
  **Deprecated**: This endpoint is deprecated. Use [`/extract`](/api-reference/endpoint/extract) with `async: true` instead.
</Warning>

## Overview

The asynchronous extraction endpoint accepts the same input parameters as the synchronous `/extract` endpoint but returns immediately with a job identifier. Use this endpoint for:

* Large documents that may take longer to process
* Batch processing workflows
* Non-blocking integrations

### Migration

Replace calls to `/extract_async` with `/extract` and add `async: true`:

```diff theme={null}
- POST /extract_async
- {"file_url": "https://example.com/doc.pdf"}

+ POST /extract
+ {"file_url": "https://example.com/doc.pdf", "async": true}
```

The response format is identical.

## Request

### Document Source

Provide the document using one of these methods:

| Field      | Type   | Description                                                    |
| ---------- | ------ | -------------------------------------------------------------- |
| `file`     | binary | Document file to upload directly (multipart/form-data).        |
| `file_url` | string | Public or pre-signed URL that Pulse will download and extract. |

### Extraction Options

| Field               | Type          | Default   | Description                                                                                                                                                                                                                                                     |
| ------------------- | ------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`             | string (enum) | `default` | Extraction model to use. One of `default` or `pulse-ultra-2`. `pulse-ultra-2` uses Pulse's vision-language model with built-in refinement, figure/chart extraction, and word-level bounding boxes.                                                              |
| `pages`             | string        | -         | Page range filter (1-indexed). Supports segments like `1-2` or mixed ranges like `1-2,5`. Page 1 is the first page.                                                                                                                                             |
| `figure_processing` | object        | -         | Settings that control how figures in the document are processed. These affect the **markdown output directly** and do not produce additional output fields. See [Figure Processing](#figure-processing).                                                        |
| `extensions`        | object        | -         | Settings that enable additional processing or alternate output formats. Each enabled extension produces a corresponding result under `response.extensions.*`. See [Extensions](#extensions).                                                                    |
| `spreadsheet`       | object        | -         | Settings for Excel/spreadsheet extraction. Controls hidden rows, columns, sheets, raw values, phantom-cell trimming, and whether table `cell_data` is included. Applies to `.xlsx`, `.xlsm`, and `.xls` files. See [Spreadsheet Options](#spreadsheet-options). |
| `storage`           | object        | -         | Options for persisting extraction artifacts. See [Storage Options](#storage-options).                                                                                                                                                                           |
| `async`             | boolean       | `false`   | If `true`, returns immediately with a `job_id` for polling via `GET /job/{jobId}`.                                                                                                                                                                              |
| `force_url`         | boolean       | `false`   | When `true`, return the complete extraction result as a URL even if it is small. Spreadsheet responses are URL-backed by default; set `force_url: false` to request inline spreadsheet output. URL delivery changes only the transport, not the result shape.   |
| `structured_output` | object        | -         | **⚠️ Deprecated** — Use the [`/schema`](/api-reference/endpoint/schema) endpoint after extraction instead. Still works for backward compatibility.                                                                                                              |

### Figure Processing

Settings under `figure_processing` control how figures (images, charts, diagrams) and embedded visuals are processed. Applies to both PDFs/images (figures detected from layout) and spreadsheets (charts and embedded images read directly from the workbook). Affects the markdown output and the `bounding_boxes.Images[]` array.

| Field                           | Type    | Default | Description                                                                                                                                                                                                                                                                                |
| ------------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `figure_processing.description` | boolean | `false` | Generate descriptive captions for extracted visuals. Captions appear under `bounding_boxes.Images[].description` and inline in the markdown output. Applies to both detected charts and non-chart images.                                                                                  |
| `figure_processing.show_images` | boolean | `false` | Return image URLs for extracted visuals. URLs appear under `bounding_boxes.Images[].image_url` and resolve to a Pulse-hosted PNG/JPEG served from [`GET /results/{jobId}/images/{filename}`](/api-reference/endpoint/results-image). Applies to both detected charts and non-chart images. |

<Note>
  For spreadsheets specifically, `show_images: true` collects every embedded chart and image in the workbook and emits one entry per visual under `bounding_boxes.Images`, with chart-specific fields like `chart_type`, `chart_title`, and `source_ranges` populated. See [Bounding Boxes](/api-reference/bounding-boxes#images-array) for the full field list.
</Note>

### Spreadsheet Options

Settings under `spreadsheet` control how Excel workbooks (`.xlsx`, `.xlsm`, `.xls`) are processed. By default, hidden rows, columns, and sheets are excluded from extraction output, cell values are rendered the way Excel displays them, and table cell metadata is included. Phantom-cell trimming is opt-in. Spreadsheet responses are returned as full-result URLs by default because workbook `cell_data` can make the payload large even for modest `.xlsx` files.

| Field                               | Type    | Default | Description                                                                                                                                                                                                                                                                                                                                                                                                          |
| ----------------------------------- | ------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `spreadsheet.include_hidden_rows`   | boolean | `false` | Include rows that are hidden in the Excel workbook.                                                                                                                                                                                                                                                                                                                                                                  |
| `spreadsheet.include_hidden_cols`   | boolean | `false` | Include columns that are hidden in the Excel workbook.                                                                                                                                                                                                                                                                                                                                                               |
| `spreadsheet.include_hidden_sheets` | boolean | `false` | Include sheets that are hidden in the Excel workbook.                                                                                                                                                                                                                                                                                                                                                                |
| `spreadsheet.use_raw_values`        | boolean | `false` | Emit the underlying numeric value for number cells instead of the Excel display-formatted text — e.g. `1201.67` rather than `$1,202` when the cell uses a rounded currency format. Useful when downstream processing needs exact amounts (cent-level precision) rather than what the workbook shows visually. Percent-formatted cells and dates keep their display rendering. Does not apply to legacy `.xls` files. |
| `spreadsheet.only_data_rows`        | boolean | `false` | When `true`, trim trailing empty rows past the last cell carrying a value or formula. See [Phantom-cell trimming](#phantom-cell-trimming-only_data_rows--only_data_cols) below.                                                                                                                                                                                                                                      |
| `spreadsheet.only_data_cols`        | boolean | `false` | When `true`, trim trailing empty columns past the last cell carrying a value or formula. Same rationale as `only_data_rows`.                                                                                                                                                                                                                                                                                         |
| `spreadsheet.cell_data`             | boolean | `true`  | Include cell-level table metadata under `bounding_boxes.Tables[].cell_data`. Set to `false` to omit this metadata and reduce output size.                                                                                                                                                                                                                                                                            |

<Note>
  These settings accept both camelCase (`includeHiddenRows`, `onlyDataRows`, `cellData`) and snake\_case (`include_hidden_rows`, `only_data_rows`, `cell_data`) formats.
</Note>

#### Phantom-cell trimming (`only_data_rows` / `only_data_cols`)

Excel files exported from claims systems, ERPs, and other automated pipelines routinely declare a "used range" that extends hundreds of thousands of rows past where the data actually ends. A typical case: a 57 MB workbook with only \~500 rows of real data, where the other \~1,000,000 rows are empty cells that exist only because they were once selected and styled. These phantom cells inflate file size by orders of magnitude and can exhaust parser memory on the extraction pipeline.

Set `only_data_rows: true` and `only_data_cols: true` to have Pulse scan each sheet once before parsing, find the largest row and column containing a value or formula, and ignore everything beyond that extent. Surviving cells keep their **original A1 coordinates** (e.g., a value at `B7` in the source is still `B7` in the output), so any citation or bounding box that references a specific cell remains stable. The trim only kicks in on large sheets (≥5 MB of XML per sheet), so small, well-formed workbooks pay no overhead either way.

Both flags default to `false`.

### Pulse Ultra 2 Options

These options are available only when `model: pulse-ultra-2` is set. Passing any of them with the default model returns a 400 error listing the offending fields.

| Field                       | Type    | Default | Description                                                                                                                                                                                                                                                           |
| --------------------------- | ------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `refine`                    | boolean | `false` | Run a full-page OCR and formatting correction pass after extraction. Improves accuracy on dense layouts, numerical values, and table structure. Adds \~1–2s per page. Overridden by `refine_options` if both are provided.                                            |
| `refine_options`            | object  | -       | Granular refinement targets. Takes precedence over the boolean `refine` flag. See below.                                                                                                                                                                              |
| `refine_options.tables`     | boolean | `false` | Fix table cell values, structure, and headers against the source image.                                                                                                                                                                                               |
| `refine_options.text`       | boolean | `false` | Fix OCR errors, missing or extra content, and numerical accuracy (tables untouched).                                                                                                                                                                                  |
| `refine_options.formatting` | boolean | `false` | Add strikethrough, italic, bold, super/subscript, and LaTeX formatting (tables untouched).                                                                                                                                                                            |
| `extract_figure`            | boolean | `false` | Convert charts and data visualizations into HTML `<table>` blocks, wrapped in `<figure-table>` tags. Useful for financial decks, dashboards, and scientific charts.                                                                                                   |
| `figure_description`        | boolean | `false` | Generate a 1–2 paragraph natural-language description of each picture, wrapped in `<figure-description>` tags. Combines well with `extract_figure`.                                                                                                                   |
| `detect_selections`         | boolean | `true`  | Detect selected and unselected marks with a specialized selection-mark model. Improves accuracy on forms, checkboxes, radio buttons, handwritten checkmarks, X marks, and similar controls. Enabled by default for `pulse-ultra-2`; set to `false` to skip this pass. |
| `additional_prompt`         | string  | `""`    | Extra context injected into the extraction prompt. Use to steer extraction toward a specific domain or attention focus. Max 4000 characters.                                                                                                                          |
| `custom_image_prompt`       | string  | `""`    | Extra context appended to the prompt used by `figure_description` and `extract_figure`. Tunes image and chart interpretation. Max 2000 characters.                                                                                                                    |
| `custom_refine_prompt`      | string  | `""`    | Extra context appended to the refinement prompt. Only applies when `refine: true` or `refine_options` is set. Max 2000 characters.                                                                                                                                    |

#### Selection mark detection

Use `detect_selections: true` with `model: pulse-ultra-2` when a document contains forms, checkboxes, radio buttons, handwritten selection marks, or other marked-choice controls. Pulse runs a specialized detection pass for these marks so selected/unselected states are less likely to be missed or confused with nearby text, boxes, or handwriting. When available, the detected state is returned on the relevant bounding-box items as `selected`.

#### Markdown output additions

When `extract_figure` or `figure_description` is enabled, figures in `response.markdown` include additional tags:

```html theme={null}
<figure data-page="1">
  <figure-table>...HTML table for the chart...</figure-table>
  <figure-description>...1–2 paragraph description...</figure-description>
</figure>
```

When `refine` (or `refine_options`) is set, markdown content is post-processed page-by-page; output is cleaner but typically grows \~1.5–3x in size for dense documents. No new tags are introduced.

### Extensions

Settings under `extensions` enable additional processing passes or alternate output formats. Each enabled extension produces a **corresponding output field** under `response.extensions.*`. For example, enabling `extensions.chunking` produces `response.extensions.chunking`, and enabling `extensions.alt_outputs.return_html` produces `response.extensions.alt_outputs.html`.

| Field                                | Type      | Default | Description                                                                                                           |
| ------------------------------------ | --------- | ------- | --------------------------------------------------------------------------------------------------------------------- |
| `extensions.footnote_references`     | boolean   | `false` | Link footnote markers to their corresponding footnote text.                                                           |
| `extensions.chunking`                | object    | -       | Chunking configuration. See below.                                                                                    |
| `extensions.chunking.chunk_types`    | string\[] | -       | List of chunking strategies: `semantic`, `header`, `page`, `recursive`.                                               |
| `extensions.chunking.chunk_size`     | integer   | -       | Maximum characters per chunk.                                                                                         |
| `extensions.alt_outputs`             | object    | -       | Alternate output formats. See below.                                                                                  |
| `extensions.alt_outputs.wlbb`        | boolean   | `false` | Enable word-level bounding boxes (PDF only). Results in `response.extensions.alt_outputs.wlbb`.                       |
| `extensions.alt_outputs.return_html` | boolean   | `false` | Include HTML representation. `response.markdown` is still present; HTML is at `response.extensions.alt_outputs.html`. |
| `extensions.alt_outputs.return_xml`  | boolean   | `false` | Include XML representation (work in progress).                                                                        |

### Storage Options

Control whether extractions are saved to your extraction library:

| Field                 | Type          | Default | Description                                                                           |
| --------------------- | ------------- | ------- | ------------------------------------------------------------------------------------- |
| `storage.enabled`     | boolean       | `true`  | Whether to persist extraction artifacts. Set to `false` for temporary extractions.    |
| `storage.folder_name` | string        | -       | Target folder name to save the extraction to. Creates the folder if it doesn't exist. |
| `storage.folder_id`   | string (uuid) | -       | Target folder ID to save the extraction to. Takes precedence over `folder_name`.      |

### Deprecated Fields

The following input fields are deprecated and will be removed in a future version. They are still accepted for backward compatibility.

| Field               | Replacement                                                                                                                                                         |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `show_images`       | Use `figure_processing.show_images`                                                                                                                                 |
| `chunking`          | Use `extensions.chunking.chunk_types` (array instead of comma-separated string)                                                                                     |
| `chunk_size`        | Use `extensions.chunking.chunk_size`                                                                                                                                |
| `return_html`       | Use `extensions.alt_outputs.return_html`                                                                                                                            |
| `structured_output` | Use [`/schema`](/api-reference/endpoint/schema) endpoint after extraction. Pass `extraction_id` + `schema_config`. Accepts `schema`, `schema_prompt`, and `effort`. |
| `schema`            | Use [`/schema`](/api-reference/endpoint/schema) endpoint after extraction                                                                                           |
| `schema_prompt`     | Use [`/schema`](/api-reference/endpoint/schema) endpoint with `schema_config.schema_prompt`                                                                         |
| `custom_prompt`     | No replacement                                                                                                                                                      |
| `thinking`          | No replacement                                                                                                                                                      |

<Note>
  When legacy input fields are used, the API returns a deprecation warning in the `warnings` array directing you to the updated field names. See the [latest documentation](https://docs.runpulse.com/api-reference/endpoint/extract) for details.
</Note>

## Response

When you submit a document for async extraction, you'll receive a response containing the job metadata:

```json theme={null}
{
  "job_id": "abc123-def456-ghi789",
  "status": "pending",
  "queuedAt": "2025-01-15T10:30:00Z"
}
```

### Response Fields

| Field      | Type   | Description                                                                                                                        |
| ---------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------- |
| `job_id`   | string | Unique identifier for the extraction job. Use this to poll for results with the [Poll Job](/api-reference/endpoint/poll) endpoint. |
| `status`   | string | Initial job status. Typically `pending` when first submitted.                                                                      |
| `queuedAt` | string | ISO 8601 timestamp indicating when the job was accepted.                                                                           |

## Retrieving Results

After submitting an async extraction, poll the job status endpoint to retrieve results:

```bash theme={null}
GET /job/{job_id}
```

The job status endpoint will return the extraction results once the job is completed. See the [Poll Job](/api-reference/endpoint/poll) documentation for details on the response structure.

<Note>
  For detailed information on the extraction output format (markdown, bounding boxes, chunks, etc.), see the [Extract](/api-reference/endpoint/extract) documentation.
</Note>

## Example Usage

### Submit Async Extraction

<CodeGroup>
  ```python Python theme={null}
  import time
  from pulse import Pulse

  client = Pulse(api_key="YOUR_API_KEY")

  # Submit async extraction
  submission = client.extract_async(
      file_url="https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf"
  )

  print(f"Job ID: {submission.job_id}")
  print(f"Status: {submission.status}")

  # Poll for completion
  job_id = submission.job_id
  while True:
      job_status = client.jobs.get_job(job_id=job_id)
      print(f"Status: {job_status.status}")
      
      if job_status.status == "completed":
          print("Extraction complete!")
          print(f"Result: {job_status.result}")
          break
      elif job_status.status in ["failed", "canceled"]:
          print(f"Job ended: {job_status.status}")
          if job_status.error:
              print(f"Error: {job_status.error}")
          break
      
      time.sleep(2)
  ```

  ```typescript TypeScript theme={null}
  import { PulseClient } from 'pulse-ts-sdk';

  const client = new PulseClient({ 
      apiKey: 'YOUR_API_KEY'
  });

  // Submit async extraction
  const submission = await client.extract({
      fileUrl: "https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf",
      async: true
  });

  console.log(`Job ID: ${submission.job_id}`);
  console.log(`Status: ${submission.status}`);

  // Poll for completion
  const jobId = submission.job_id;
  while (true) {
      const jobStatus = await client.jobs.getJob({ jobId });
      console.log(`Status: ${jobStatus.status}`);
      
      if (jobStatus.status === 'completed') {
          console.log('Extraction complete!');
          console.log(`Result: ${JSON.stringify(jobStatus.result)}`);
          break;
      } else if (jobStatus.status === 'failed' || jobStatus.status === 'canceled') {
          console.log(`Job ended: ${jobStatus.status}`);
          if (jobStatus.error) {
              console.log(`Error: ${jobStatus.error}`);
          }
          break;
      }
      
      await new Promise(resolve => setTimeout(resolve, 2000));
  }
  ```

  ```bash curl theme={null}
  # Submit async extraction with file upload
  curl -X POST https://api.runpulse.com/extract_async \
    -H "x-api-key: YOUR_API_KEY" \
    -F "file=@document.pdf"

  # Submit async extraction with URL
  curl -X POST https://api.runpulse.com/extract_async \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"file_url": "https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf"}'

  # Response
  # {"job_id": "abc123", "status": "pending", "queuedAt": "2025-01-15T10:30:00Z"}

  # Poll for results
  curl https://api.runpulse.com/job/abc123 \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>

### With Structured Output

<CodeGroup>
  ```python Python theme={null}
  schema = {
      "type": "object",
      "properties": {
          "total": {"type": "number"},
          "vendor": {"type": "string"}
      }
  }

  submission = client.extract_async(
      file_url="https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf",
      structured_output={
          "schema": schema,
          "schema_prompt": "Extract the invoice total"
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const submission = await client.extract({
      fileUrl: "https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf",
      async: true,
      structuredOutput: {
          schema: {
              type: "object",
              properties: {
                  total: { type: "number" },
                  vendor: { type: "string" }
              }
          },
          schemaPrompt: "Extract the invoice total"
      }
  });
  ```

  ```bash curl theme={null}
  curl -X POST https://api.runpulse.com/extract_async \
    -H "x-api-key: YOUR_API_KEY" \
    -F "file=@invoice.pdf" \
    -F 'structured_output={"schema": {"type": "object", "properties": {"total": {"type": "number"}}}, "schema_prompt": "Extract the invoice total"}'
  ```
</CodeGroup>

### Cancel a Job

<CodeGroup>
  ```python Python theme={null}
  # Cancel a running job
  cancellation = client.jobs.cancel_job(job_id=job_id)
  print(f"Cancelled: {cancellation.message}")

  # Verify cancellation
  status = client.jobs.get_job(job_id=job_id)
  print(f"Status: {status.status}")  # Should be "canceled"
  ```

  ```typescript TypeScript theme={null}
  // Cancel a running job
  const cancellation = await client.jobs.cancelJob({ jobId });
  console.log(`Cancelled: ${cancellation.message}`);

  // Verify cancellation
  const status = await client.jobs.getJob({ jobId });
  console.log(`Status: ${status.status}`);  // Should be "canceled"
  ```

  ```bash curl theme={null}
  # Cancel a job
  curl -X DELETE https://api.runpulse.com/job/abc123 \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>


## OpenAPI

````yaml POST /extract_async
openapi: 3.1.0
info:
  title: Pulse API Structure
  version: 0.1.0
  description: >-
    Canonical contract for the Pulse extraction APIs. This specification is the
    single source of truth for shared request/response models that client and
    server packages consume.
servers:
  - url: https://api.runpulse.com
    description: Default Pulse API base URL
security:
  - ApiKey: []
paths:
  /extract_async:
    post:
      tags:
        - Extract
      summary: Submit an asynchronous extraction job
      description: |-
        **Deprecated**: Use `/extract` with `async: true` instead.

        Starts an asynchronous extraction job. The request mirrors the
        synchronous options but returns immediately with a job identifier that
        clients can poll for completion status.
      operationId: submitExtractJob
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/ExtractInput'
            encoding:
              figureProcessing:
                contentType: application/json
              extensions:
                contentType: application/json
              spreadsheet:
                contentType: application/json
              storage:
                contentType: application/json
              structuredOutput:
                contentType: application/json
              schema:
                contentType: application/json
      responses:
        '200':
          description: Asynchronous extraction job accepted
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AsyncSubmissionResponse'
        '400':
          description: Invalid request parameters
        '401':
          description: Authentication failed or missing API key
        '429':
          description: Rate limit exceeded
      deprecated: true
components:
  schemas:
    ExtractInput:
      description: >-
        Input schema for extraction requests. Provide either file (direct
        upload) or fileUrl (remote URL).
      allOf:
        - $ref: '#/components/schemas/ExtractSourceMultipart'
        - $ref: '#/components/schemas/ExtractOptions'
    AsyncSubmissionResponse:
      type: object
      description: >-
        Acknowledgement returned when a request is submitted for asynchronous
        processing. Poll `GET /job/{job_id}` to check status and retrieve
        results.
      required:
        - job_id
        - status
      properties:
        job_id:
          type: string
          description: Identifier assigned to the asynchronous job.
        status:
          type: string
          description: Initial status reported by the server.
          enum:
            - pending
            - processing
            - completed
            - failed
            - canceled
        message:
          type: string
          description: Human-readable description of the accepted job.
        queuedAt:
          type: string
          format: date-time
          deprecated: true
          description: >-
            **Deprecated** — Timestamp indicating when the job was accepted.
            Retained for backward compatibility. Use `GET /job/{jobId}` for
            timing details.
        credits_used:
          type: number
          format: float
          nullable: true
          description: >-
            Number of credits consumed by this request. Only present when the
            organization has the credit billing system enabled.
    ExtractSourceMultipart:
      type: object
      description: >-
        Document source definition for multipart/form-data requests. Provide
        exactly one of `file` (direct upload) or `fileUrl` (remote URL).
      properties:
        file:
          type: string
          format: binary
          description: Document to upload directly. Required unless fileUrl is provided.
        fileUrl:
          type: string
          format: uri
          x-fern-property-name: file_url
          description: >-
            Public or pre-signed URL that Pulse will download and extract.
            Required unless file is provided.
    ExtractOptions:
      type: object
      description: Common extraction options shared by sync and async extraction requests.
      properties:
        model:
          type: string
          description: >-
            Extraction model to use. When set to `pulse-ultra-2`, routes the
            request through Pulse Ultra 2 (self-hosted VPC model) instead of the
            default cloud-based service. If omitted or set to `default`, the
            default model is used.
          enum:
            - default
            - pulse-ultra-2
        detectSelections:
          type: boolean
          x-fern-property-name: detect_selections
          description: >-
            Pulse Ultra 2 only. Enables a specialized selection-mark detection
            pass that improves selected/unselected state accuracy for forms,
            checkboxes, radio buttons, handwritten checkmarks, X marks, and
            similar controls. Enabled by default when `model` is
            `pulse-ultra-2`; set to false to skip this pass. Passing true
            without `model: pulse-ultra-2` returns a validation error.
        extractionConfigId:
          type: string
          format: uuid
          x-fern-property-name: extraction_config_id
          description: >-
            UUID of a saved extraction configuration (a "preset"). When
            provided, the server loads the saved configuration and applies its
            options on top of any inline parameters supplied in this request.
            Inline parameters always take precedence over preset values for the
            same field. Saved configs are managed via the platform UI or the
            `input_extractions` admin endpoints.
        pages:
          type: string
          description: >-
            Page range filter supporting segments such as `1-2` or mixed ranges
            like `1-2,5`.
          pattern: ^[0-9]+(-[0-9]+)?(,[0-9]+(-[0-9]+)?)*$
        forceUrl:
          type: boolean
          x-fern-property-name: force_url
          default: false
          description: >-
            When true, return the complete extraction result as a URL even if it
            is small. Spreadsheet extractions use URL delivery by default; set
            `force_url: false` to request inline spreadsheet output. URL
            delivery changes only the transport, not the result shape.
        figureProcessing:
          type: object
          x-fern-property-name: figure_processing
          description: >-
            Settings that control how figures and embedded visuals are
            processed. Applies to both PDFs/images (where figures are detected
            from layout) and spreadsheets (where charts and embedded images are
            read directly from the workbook). These options affect the markdown
            output and the `bounding_boxes.Images[]` array; they do not produce
            additional output fields elsewhere in the response.
          properties:
            description:
              type: boolean
              default: false
              description: >-
                Generate descriptive captions for extracted visuals. When
                `true`, applies to both detected charts and non-chart images.
                Captions appear under `bounding_boxes.Images[].description` and
                inline in the markdown output where applicable.
            showImages:
              type: boolean
              x-fern-property-name: show_images
              default: false
              description: >-
                Return image URLs for extracted visuals. When `true`, applies to
                both charts and non-chart images. URLs are emitted under
                `bounding_boxes.Images[].image_url` — typically a Pulse-hosted
                proxy URL served from `GET /results/{jobId}/images/{filename}`.
                Spreadsheet charts and embedded images are read directly from
                the workbook; PDF/image inputs use detected figure regions.
        extensions:
          type: object
          description: >-
            Settings that enable additional processing passes or alternate
            output formats. Each enabled extension produces a corresponding
            output field under `response.extensions.*`.
          properties:
            footnoteReferences:
              type: boolean
              x-fern-property-name: footnote_references
              default: false
              description: Link footnote markers to their corresponding footnote text.
            chunking:
              type: object
              description: >-
                Chunking configuration. When provided, the document is split
                into chunks using the specified strategies. Results appear in
                `response.extensions.chunking`.
              properties:
                chunkTypes:
                  type: array
                  x-fern-property-name: chunk_types
                  items:
                    type: string
                    enum:
                      - semantic
                      - header
                      - page
                      - recursive
                  description: >-
                    List of chunking strategies to apply (e.g. `["semantic",
                    "header", "page", "recursive"]`).
                chunkSize:
                  type: integer
                  minimum: 1
                  x-fern-property-name: chunk_size
                  description: Maximum characters per chunk.
            altOutputs:
              type: object
              x-fern-property-name: alt_outputs
              description: >-
                Alternate output format options. Each enabled format produces a
                corresponding field under `response.extensions.altOutputs`.
              properties:
                wlbb:
                  type: boolean
                  default: false
                  description: >-
                    Enable word-level bounding boxes. Runs an additional OCR
                    model to derive bounding boxes for each word. Only applies
                    to PDFs. Results in `response.extensions.altOutputs.wlbb`.
                returnHtml:
                  type: boolean
                  x-fern-property-name: return_html
                  default: false
                  description: >-
                    Include an HTML representation of the document. When
                    enabled, `response.markdown` is still present and the HTML
                    is available at `response.extensions.altOutputs.html`.
                returnXml:
                  type: boolean
                  x-fern-property-name: return_xml
                  default: false
                  description: >-
                    Include an XML representation of the document. Results in
                    `response.extensions.altOutputs.xml`. (Work in progress.)
        spreadsheet:
          type: object
          description: >-
            Settings for Excel/spreadsheet extraction. Controls handling of
            hidden rows, columns, and sheets, whether numeric cells are rendered
            using their display format or underlying raw value, whether table
            cell metadata is captured, and optional trimming of empty phantom
            rows/columns past the last data-bearing cell. Applies to `.xlsx`,
            `.xlsm`, and `.xls` files. Accepts both camelCase and snake_case
            field names.
          properties:
            includeHiddenRows:
              type: boolean
              x-fern-property-name: include_hidden_rows
              default: false
              description: Include rows that are hidden in the Excel workbook.
            includeHiddenCols:
              type: boolean
              x-fern-property-name: include_hidden_cols
              default: false
              description: Include columns that are hidden in the Excel workbook.
            includeHiddenSheets:
              type: boolean
              x-fern-property-name: include_hidden_sheets
              default: false
              description: Include sheets that are hidden in the Excel workbook.
            useRawValues:
              type: boolean
              x-fern-property-name: use_raw_values
              default: false
              description: >-
                Emit the underlying numeric value for number cells instead of
                the Excel display-formatted text (e.g. `1201.67` rather than
                `$1,202` when the cell uses a rounded currency format).
                Percent-formatted cells and dates keep their display rendering.
                Does not apply to legacy `.xls` files.
            onlyDataRows:
              type: boolean
              x-fern-property-name: only_data_rows
              default: false
              description: >-
                When true, trim trailing empty rows past the last cell carrying
                a value or formula before parsing. Excel exports from claims
                systems and ERPs routinely declare a used range with hundreds of
                thousands of empty-but-styled phantom rows that inflate file
                size and exhaust parser memory; enabling this strips them out
                without touching any cell that actually has data. Surviving
                cells keep their original A1 coordinates so citations that
                reference a specific cell remain stable. Defaults to false.
            onlyDataCols:
              type: boolean
              x-fern-property-name: only_data_cols
              default: false
              description: >-
                When true, trim trailing empty columns past the last cell
                carrying a value or formula. Same rationale and
                coordinate-stability guarantee as `onlyDataRows`. Defaults to
                false.
            cellData:
              type: boolean
              x-fern-property-name: cell_data
              default: true
              description: >-
                Include cell-level table metadata under
                `bounding_boxes.Tables[].cell_data`. Set to false to omit this
                metadata and reduce output size.
        storage:
          type: object
          description: >-
            Options for persisting extraction artifacts. When enabled (default),
            artifacts are saved to storage and a database record is created.
          properties:
            enabled:
              type: boolean
              description: >-
                Whether to persist extraction artifacts. Set to false for
                temporary extractions with no storage or database record.
              default: true
            folderName:
              type: string
              x-fern-property-name: folder_name
              description: >-
                Target folder name to save the extraction to. Creates the folder
                if it doesn't exist.
            folderId:
              type: string
              format: uuid
              x-fern-property-name: folder_id
              description: >-
                Target folder ID to save the extraction to. Takes precedence
                over folderName if both are provided.
        async:
          type: boolean
          default: false
          description: >-
            If true, returns immediately with a job_id for polling via GET
            /job/{jobId}. Otherwise processes synchronously.
        structuredOutput:
          type: object
          x-fern-property-name: structured_output
          deprecated: true
          description: >-
            **⚠️ DEPRECATED** — Use the `/schema` endpoint after extraction
            instead. Pass the `extraction_id` from the extract response to
            `/schema` with your `schema_config`. This parameter still works for
            backward compatibility but will be removed in a future version.
          properties:
            schema:
              type: object
              description: JSON schema describing the structured data to extract.
            schemaPrompt:
              type: string
              x-fern-property-name: schema_prompt
              description: Natural language prompt with additional extraction instructions.
            effort:
              type: boolean
              default: false
              description: >-
                Use higher quality model for better results. When true, uses a
                more capable model at the cost of higher latency.
        schema:
          description: >-
            (Deprecated) JSON schema describing structured data to extract. Use
            structuredOutput instead. Accepts either a JSON object or a
            stringified JSON representation.
          oneOf:
            - type: object
            - type: string
          deprecated: true
        schemaPrompt:
          type: string
          x-fern-property-name: schema_prompt
          description: >-
            (Deprecated) Natural language prompt for schema-guided extraction.
            Use structuredOutput.schemaPrompt instead.
          deprecated: true
        customPrompt:
          type: string
          x-fern-property-name: custom_prompt
          description: >-
            (Deprecated) Custom instructions that augment the default extraction
            behaviour. Use `figureProcessing` or `extensions` instead.
          deprecated: true
        chunking:
          type: string
          deprecated: true
          description: >-
            **⚠️ DEPRECATED** — Use `extensions.chunking.chunkTypes` instead.
            Comma-separated list of chunking strategies to apply (for example
            `semantic,header,page,recursive`). Still accepted for backward
            compatibility.
        chunkSize:
          type: integer
          minimum: 1
          x-fern-property-name: chunk_size
          deprecated: true
          description: >-
            **⚠️ DEPRECATED** — Use `extensions.chunking.chunkSize` instead.
            Override for maximum characters per chunk when chunking is enabled.
        extractFigure:
          type: boolean
          x-fern-property-name: extract_figure
          deprecated: true
          description: '**⚠️ DEPRECATED** — Toggle to enable figure extraction in results.'
          default: false
        figureDescription:
          type: boolean
          x-fern-property-name: figure_description
          deprecated: true
          description: >-
            **⚠️ DEPRECATED** — Use `figureProcessing.description` instead.
            Toggle to generate descriptive captions for extracted figures.
          default: false
        showImages:
          type: boolean
          x-fern-property-name: show_images
          deprecated: true
          description: >-
            **⚠️ DEPRECATED** — Use `figureProcessing.showImages` instead. Embed
            base64-encoded images inline in figure tags in the output. Increases
            response size.
          default: false
        returnHtml:
          type: boolean
          x-fern-property-name: return_html
          deprecated: true
          description: >-
            **⚠️ DEPRECATED** — Use `extensions.altOutputs.returnHtml` instead.
            Whether to include HTML representation alongside markdown in the
            response.
          default: false
        thinking:
          type: boolean
          description: (Deprecated) Enables expanded rationale output for debugging.
          default: false
          deprecated: true
  securitySchemes:
    ApiKey:
      type: apiKey
      in: header
      name: x-api-key
      x-fern-header:
        name: apiKey
        env: PULSE_API_KEY

````