> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract File Async (Deprecated)

> **Deprecated:** Use `/extract` with `async: true` instead.

Starts an asynchronous extraction job. The request mirrors the
synchronous options but returns immediately with a job identifier that
clients can poll for completion status.


<Warning>
  **Deprecated**: This endpoint is deprecated. Use [`/extract`](/api-reference/endpoint/extract) with `async: true` instead.
</Warning>

## Overview

The asynchronous extraction endpoint accepts the same input parameters as the synchronous `/extract` endpoint but returns immediately with a job identifier. Use this endpoint for:

* Large documents that may take longer to process
* Batch processing workflows
* Non-blocking integrations

### Migration

Replace calls to `/extract_async` with `/extract` and add `async: true`:

```diff theme={null}
- POST /extract_async
- {"file_url": "https://example.com/doc.pdf"}

+ POST /extract
+ {"file_url": "https://example.com/doc.pdf", "async": true}
```

The response format is identical.

## Request

### Document Source

Provide the document using one of these methods:

| Field      | Type   | Description                                                    |
| ---------- | ------ | -------------------------------------------------------------- |
| `file`     | binary | Document file to upload directly (multipart/form-data).        |
| `file_url` | string | Public or pre-signed URL that Pulse will download and extract. |

### Extraction Options

| Field               | Type          | Default   | Description                                                                                                                                                                                              |
| ------------------- | ------------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`             | string (enum) | `default` | Extraction model to use. One of `default` or `pulse-ultra-2`. `pulse-ultra-2` uses Pulse's vision-language model with built-in refinement, figure/chart extraction, and word-level bounding boxes.       |
| `pages`             | string        | -         | Page range filter (1-indexed). Supports segments like `1-2` or mixed ranges like `1-2,5`. Page 1 is the first page.                                                                                      |
| `figure_processing` | object        | -         | Settings that control how figures in the document are processed. These affect the **markdown output directly** and do not produce additional output fields. See [Figure Processing](#figure-processing). |
| `extensions`        | object        | -         | Settings that enable additional processing or alternate output formats. Each enabled extension produces a corresponding result under `response.extensions.*`. See [Extensions](#extensions).             |
| `spreadsheet`       | object        | -         | Settings for Excel/spreadsheet extraction. Controls handling of hidden rows, columns, and sheets. Only applies to `.xlsx` and `.xls` files. See [Spreadsheet Options](#spreadsheet-options).             |
| `storage`           | object        | -         | Options for persisting extraction artifacts. See [Storage Options](#storage-options).                                                                                                                    |
| `async`             | boolean       | `false`   | If `true`, returns immediately with a `job_id` for polling via `GET /job/{jobId}`.                                                                                                                       |
| `structured_output` | object        | -         | **⚠️ Deprecated** — Use the [`/schema`](/api-reference/endpoint/schema) endpoint after extraction instead. Still works for backward compatibility.                                                       |

### Figure Processing

Settings under `figure_processing` control how figures (images, charts, diagrams) in the document are processed. These settings affect the **markdown output directly** — for example, adding descriptive captions to figures or converting charts into markdown tables. They do **not** create additional output fields in the response.

| Field                           | Type    | Default | Description                                                                 |
| ------------------------------- | ------- | ------- | --------------------------------------------------------------------------- |
| `figure_processing.description` | boolean | `false` | Generate descriptive captions for extracted figures.                        |
| `figure_processing.show_images` | boolean | `false` | Embed base64-encoded images inline in figure tags. Increases response size. |

### Spreadsheet Options

Settings under `spreadsheet` control how Excel workbooks (`.xlsx`, `.xls`) are processed. By default, hidden rows, columns, and sheets are excluded from extraction output.

| Field                               | Type    | Default | Description                                            |
| ----------------------------------- | ------- | ------- | ------------------------------------------------------ |
| `spreadsheet.include_hidden_rows`   | boolean | `false` | Include rows that are hidden in the Excel workbook.    |
| `spreadsheet.include_hidden_cols`   | boolean | `false` | Include columns that are hidden in the Excel workbook. |
| `spreadsheet.include_hidden_sheets` | boolean | `false` | Include sheets that are hidden in the Excel workbook.  |

<Note>
  These settings accept both camelCase (`includeHiddenRows`) and snake\_case (`include_hidden_rows`) formats.
</Note>

### Pulse Ultra 2 Options

These options are available only when `model: pulse-ultra-2` is set. Passing any of them with the default model returns a 400 error listing the offending fields.

| Field                       | Type    | Default | Description                                                                                                                                                                                                                |
| --------------------------- | ------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `refine`                    | boolean | `false` | Run a full-page OCR and formatting correction pass after extraction. Improves accuracy on dense layouts, numerical values, and table structure. Adds \~1–2s per page. Overridden by `refine_options` if both are provided. |
| `refine_options`            | object  | -       | Granular refinement targets. Takes precedence over the boolean `refine` flag. See below.                                                                                                                                   |
| `refine_options.tables`     | boolean | `false` | Fix table cell values, structure, and headers against the source image.                                                                                                                                                    |
| `refine_options.text`       | boolean | `false` | Fix OCR errors, missing or extra content, and numerical accuracy (tables untouched).                                                                                                                                       |
| `refine_options.formatting` | boolean | `false` | Add strikethrough, italic, bold, super/subscript, and LaTeX formatting (tables untouched).                                                                                                                                 |
| `extract_figure`            | boolean | `false` | Convert charts and data visualizations into HTML `<table>` blocks, wrapped in `<figure-table>` tags. Useful for financial decks, dashboards, and scientific charts.                                                        |
| `figure_description`        | boolean | `false` | Generate a 1–2 paragraph natural-language description of each picture, wrapped in `<figure-description>` tags. Combines well with `extract_figure`.                                                                        |
| `additional_prompt`         | string  | `""`    | Extra context injected into the extraction prompt. Use to steer extraction toward a specific domain or attention focus. Max 4000 characters.                                                                               |
| `custom_image_prompt`       | string  | `""`    | Extra context appended to the prompt used by `figure_description` and `extract_figure`. Tunes image and chart interpretation. Max 2000 characters.                                                                         |
| `custom_refine_prompt`      | string  | `""`    | Extra context appended to the refinement prompt. Only applies when `refine: true` or `refine_options` is set. Max 2000 characters.                                                                                         |

#### Markdown output additions

When `extract_figure` or `figure_description` is enabled, figures in `response.markdown` include additional tags:

```html theme={null}
<figure data-page="1">
  <figure-table>...HTML table for the chart...</figure-table>
  <figure-description>...1–2 paragraph description...</figure-description>
</figure>
```

When `refine` (or `refine_options`) is set, markdown content is post-processed page-by-page; output is cleaner but typically grows \~1.5–3x in size for dense documents. No new tags are introduced.

### Extensions

Settings under `extensions` enable additional processing passes or alternate output formats. Each enabled extension produces a **corresponding output field** under `response.extensions.*`. For example, enabling `extensions.chunking` produces `response.extensions.chunking`, and enabling `extensions.alt_outputs.return_html` produces `response.extensions.alt_outputs.html`.

| Field                                | Type      | Default | Description                                                                                                           |
| ------------------------------------ | --------- | ------- | --------------------------------------------------------------------------------------------------------------------- |
| `extensions.footnote_references`     | boolean   | `false` | Link footnote markers to their corresponding footnote text.                                                           |
| `extensions.chunking`                | object    | -       | Chunking configuration. See below.                                                                                    |
| `extensions.chunking.chunk_types`    | string\[] | -       | List of chunking strategies: `semantic`, `header`, `page`, `recursive`.                                               |
| `extensions.chunking.chunk_size`     | integer   | -       | Maximum characters per chunk.                                                                                         |
| `extensions.alt_outputs`             | object    | -       | Alternate output formats. See below.                                                                                  |
| `extensions.alt_outputs.wlbb`        | boolean   | `false` | Enable word-level bounding boxes (PDF only). Results in `response.extensions.alt_outputs.wlbb`.                       |
| `extensions.alt_outputs.return_html` | boolean   | `false` | Include HTML representation. `response.markdown` is still present; HTML is at `response.extensions.alt_outputs.html`. |
| `extensions.alt_outputs.return_xml`  | boolean   | `false` | Include XML representation (work in progress).                                                                        |

### Storage Options

Control whether extractions are saved to your extraction library:

| Field                 | Type          | Default | Description                                                                           |
| --------------------- | ------------- | ------- | ------------------------------------------------------------------------------------- |
| `storage.enabled`     | boolean       | `true`  | Whether to persist extraction artifacts. Set to `false` for temporary extractions.    |
| `storage.folder_name` | string        | -       | Target folder name to save the extraction to. Creates the folder if it doesn't exist. |
| `storage.folder_id`   | string (uuid) | -       | Target folder ID to save the extraction to. Takes precedence over `folder_name`.      |

### Deprecated Fields

The following input fields are deprecated and will be removed in a future version. They are still accepted for backward compatibility.

| Field               | Replacement                                                                                                                                                         |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `show_images`       | Use `figure_processing.show_images`                                                                                                                                 |
| `chunking`          | Use `extensions.chunking.chunk_types` (array instead of comma-separated string)                                                                                     |
| `chunk_size`        | Use `extensions.chunking.chunk_size`                                                                                                                                |
| `return_html`       | Use `extensions.alt_outputs.return_html`                                                                                                                            |
| `structured_output` | Use [`/schema`](/api-reference/endpoint/schema) endpoint after extraction. Pass `extraction_id` + `schema_config`. Accepts `schema`, `schema_prompt`, and `effort`. |
| `schema`            | Use [`/schema`](/api-reference/endpoint/schema) endpoint after extraction                                                                                           |
| `schema_prompt`     | Use [`/schema`](/api-reference/endpoint/schema) endpoint with `schema_config.schema_prompt`                                                                         |
| `custom_prompt`     | No replacement                                                                                                                                                      |
| `thinking`          | No replacement                                                                                                                                                      |

<Note>
  When legacy input fields are used, the API returns a deprecation warning in the `warnings` array directing you to the updated field names. See the [latest documentation](https://docs.runpulse.com/api-reference/endpoint/extract) for details.
</Note>

## Response

When you submit a document for async extraction, you'll receive a response containing the job metadata:

```json theme={null}
{
  "job_id": "abc123-def456-ghi789",
  "status": "pending",
  "queuedAt": "2025-01-15T10:30:00Z"
}
```

### Response Fields

| Field      | Type   | Description                                                                                                                        |
| ---------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------- |
| `job_id`   | string | Unique identifier for the extraction job. Use this to poll for results with the [Poll Job](/api-reference/endpoint/poll) endpoint. |
| `status`   | string | Initial job status. Typically `pending` when first submitted.                                                                      |
| `queuedAt` | string | ISO 8601 timestamp indicating when the job was accepted.                                                                           |

## Retrieving Results

After submitting an async extraction, poll the job status endpoint to retrieve results:

```bash theme={null}
GET /job/{job_id}
```

The job status endpoint will return the extraction results once the job is completed. See the [Poll Job](/api-reference/endpoint/poll) documentation for details on the response structure.

<Note>
  For detailed information on the extraction output format (markdown, bounding boxes, chunks, etc.), see the [Extract](/api-reference/endpoint/extract) documentation.
</Note>

## Example Usage

### Submit Async Extraction

<CodeGroup>
  ```python Python theme={null}
  import time
  from pulse import Pulse

  client = Pulse(api_key="YOUR_API_KEY")

  # Submit async extraction
  submission = client.extract_async(
      file_url="https://www.impact-bank.com/user/file/dummy_statement.pdf"
  )

  print(f"Job ID: {submission.job_id}")
  print(f"Status: {submission.status}")

  # Poll for completion
  job_id = submission.job_id
  while True:
      job_status = client.jobs.get_job(job_id=job_id)
      print(f"Status: {job_status.status}")
      
      if job_status.status == "completed":
          print("Extraction complete!")
          print(f"Result: {job_status.result}")
          break
      elif job_status.status in ["failed", "canceled"]:
          print(f"Job ended: {job_status.status}")
          if job_status.error:
              print(f"Error: {job_status.error}")
          break
      
      time.sleep(2)
  ```

  ```typescript TypeScript theme={null}
  import { PulseClient } from 'pulse-ts-sdk';

  const client = new PulseClient({ 
      apiKey: 'YOUR_API_KEY'
  });

  // Submit async extraction
  const submission = await client.extract({
      fileUrl: "https://www.impact-bank.com/user/file/dummy_statement.pdf",
      async: true
  });

  console.log(`Job ID: ${submission.job_id}`);
  console.log(`Status: ${submission.status}`);

  // Poll for completion
  const jobId = submission.job_id;
  while (true) {
      const jobStatus = await client.jobs.getJob({ jobId });
      console.log(`Status: ${jobStatus.status}`);
      
      if (jobStatus.status === 'completed') {
          console.log('Extraction complete!');
          console.log(`Result: ${JSON.stringify(jobStatus.result)}`);
          break;
      } else if (jobStatus.status === 'failed' || jobStatus.status === 'canceled') {
          console.log(`Job ended: ${jobStatus.status}`);
          if (jobStatus.error) {
              console.log(`Error: ${jobStatus.error}`);
          }
          break;
      }
      
      await new Promise(resolve => setTimeout(resolve, 2000));
  }
  ```

  ```bash curl theme={null}
  # Submit async extraction with file upload
  curl -X POST https://api.runpulse.com/extract_async \
    -H "x-api-key: YOUR_API_KEY" \
    -F "file=@document.pdf"

  # Submit async extraction with URL
  curl -X POST https://api.runpulse.com/extract_async \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"file_url": "https://www.impact-bank.com/user/file/dummy_statement.pdf"}'

  # Response
  # {"job_id": "abc123", "status": "pending", "queuedAt": "2025-01-15T10:30:00Z"}

  # Poll for results
  curl https://api.runpulse.com/job/abc123 \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>

### With Structured Output

<CodeGroup>
  ```python Python theme={null}
  schema = {
      "type": "object",
      "properties": {
          "total": {"type": "number"},
          "vendor": {"type": "string"}
      }
  }

  submission = client.extract_async(
      file_url="https://www.impact-bank.com/user/file/dummy_statement.pdf",
      structured_output={
          "schema": schema,
          "schema_prompt": "Extract the invoice total"
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const submission = await client.extract({
      fileUrl: "https://www.impact-bank.com/user/file/dummy_statement.pdf",
      async: true,
      structuredOutput: {
          schema: {
              type: "object",
              properties: {
                  total: { type: "number" },
                  vendor: { type: "string" }
              }
          },
          schemaPrompt: "Extract the invoice total"
      }
  });
  ```

  ```bash curl theme={null}
  curl -X POST https://api.runpulse.com/extract_async \
    -H "x-api-key: YOUR_API_KEY" \
    -F "file=@invoice.pdf" \
    -F 'structured_output={"schema": {"type": "object", "properties": {"total": {"type": "number"}}}, "schema_prompt": "Extract the invoice total"}'
  ```
</CodeGroup>

### Cancel a Job

<CodeGroup>
  ```python Python theme={null}
  # Cancel a running job
  cancellation = client.jobs.cancel_job(job_id=job_id)
  print(f"Cancelled: {cancellation.message}")

  # Verify cancellation
  status = client.jobs.get_job(job_id=job_id)
  print(f"Status: {status.status}")  # Should be "canceled"
  ```

  ```typescript TypeScript theme={null}
  // Cancel a running job
  const cancellation = await client.jobs.cancelJob({ jobId });
  console.log(`Cancelled: ${cancellation.message}`);

  // Verify cancellation
  const status = await client.jobs.getJob({ jobId });
  console.log(`Status: ${status.status}`);  // Should be "canceled"
  ```

  ```bash curl theme={null}
  # Cancel a job
  curl -X DELETE https://api.runpulse.com/job/abc123 \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>


## OpenAPI

````yaml POST /extract_async
openapi: 3.1.0
info:
  title: Pulse API
  description: >-
    Production-grade document extraction service that transforms complex
    documents  into structured, AI-ready data. This specification is the single
    source of truth  for the Pulse extraction APIs.
  version: 1.0.0
  contact:
    name: Pulse Support
    email: support@trypulse.ai
    url: https://docs.runpulse.com
servers:
  - url: https://api.runpulse.com
    description: Production server
security:
  - ApiKeyAuth: []
paths:
  /extract_async:
    post:
      tags:
        - Extract
      summary: Extract Document Async (Deprecated)
      description: |
        **Deprecated:** Use `/extract` with `async: true` instead.

        Starts an asynchronous extraction job. The request mirrors the
        synchronous options but returns immediately with a job identifier that
        clients can poll for completion status.
      operationId: submitExtractJob
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/ExtractMultipartInput'
          application/json:
            schema:
              $ref: '#/components/schemas/ExtractJsonInput'
      responses:
        '200':
          description: Asynchronous extraction job accepted
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AsyncSubmissionResponse'
        '400':
          $ref: '#/components/responses/BadRequest'
        '401':
          $ref: '#/components/responses/Unauthorized'
        '429':
          $ref: '#/components/responses/TooManyRequests'
        '500':
          $ref: '#/components/responses/InternalServerError'
      deprecated: true
      x-codeSamples:
        - lang: python
          label: Python SDK
          source: |
            # DEPRECATED — use client.extract(..., async_=True) instead
            from pulse import Pulse

            client = Pulse(api_key="YOUR_API_KEY")
            response = client.extract(
                file_url="https://example.com/document.pdf",
                async_=True
            )
            print(response.job_id)
        - lang: typescript
          label: TypeScript SDK
          source: |
            // DEPRECATED — use client.extract({ ..., async: true }) instead
            import { PulseClient } from "pulse-ts-sdk";

            const client = new PulseClient({
                apiKey: "YOUR_API_KEY"
            });
            const response = await client.extract({
                fileUrl: "https://example.com/document.pdf",
                async: true
            });
            console.log(response.job_id);
components:
  schemas:
    ExtractMultipartInput:
      description: Input schema for multipart/form-data requests (file upload or file_url).
      allOf:
        - $ref: '#/components/schemas/ExtractSourceMultipart'
        - $ref: '#/components/schemas/ExtractOptions'
    ExtractJsonInput:
      description: Input schema for JSON requests (file_url only).
      allOf:
        - $ref: '#/components/schemas/ExtractSourceJson'
        - $ref: '#/components/schemas/ExtractOptions'
    AsyncSubmissionResponse:
      type: object
      description: >-
        Acknowledgement returned when a request is submitted for asynchronous
        processing. Poll GET /job/{job_id} to check status and retrieve results.
      required:
        - job_id
        - status
      properties:
        job_id:
          type: string
          description: Identifier assigned to the asynchronous job.
        status:
          type: string
          description: Initial status reported by the server.
          enum:
            - pending
            - processing
        message:
          type: string
          description: Human-readable description of the accepted job.
    ExtractSourceMultipart:
      type: object
      description: Document source definition for multipart/form-data requests.
      properties:
        file:
          type: string
          format: binary
          description: Document to upload directly. Required unless file_url is specified.
        file_url:
          type: string
          format: uri
          description: Public or pre-signed URL that Pulse will download and extract.
      required:
        - file
    ExtractOptions:
      type: object
      description: >-
        Common extraction options shared by synchronous and asynchronous
        endpoints.
      properties:
        model:
          type: string
          enum:
            - default
            - pulse-ultra-2
          description: >-
            Extraction model to use. `pulse-ultra-2` uses Pulse's
            vision-language model with built-in refinement, figure/chart
            extraction, and word-level bounding boxes. Omit or pass `default`
            for standard extraction.
        pages:
          type: string
          description: >-
            Page range filter (1-indexed, where page 1 is the first page).
            Supports segments such as `1-2` or mixed ranges like `1-2,5`.
          pattern: ^[0-9]+(-[0-9]+)?(,[0-9]+(-[0-9]+)?)*$
        figure_processing:
          type: object
          description: >-
            Settings that control how figures in the document are processed.
            These affect the markdown output directly (e.g. figure descriptions,
            chart-to-table conversion, image embedding) and do not produce
            additional output fields in the response.
          properties:
            description:
              type: boolean
              default: false
              description: Generate descriptive captions for extracted figures.
            show_images:
              type: boolean
              default: false
              description: >-
                Embed base64-encoded images inline in figure tags in the output.
                Increases response size.
        extensions:
          type: object
          description: >-
            Settings that enable additional processing passes or alternate
            output formats. Each enabled extension produces a corresponding
            output field under `response.extensions.*`.
          properties:
            merge_tables:
              type: boolean
              default: false
              description: Merge tables that span multiple pages into a single table.
            footnote_references:
              type: boolean
              default: false
              description: Link footnote markers to their corresponding footnote text.
            chunking:
              type: object
              description: >-
                Chunking configuration. When provided, the document is split
                into chunks using the specified strategies. Results appear in
                `response.extensions.chunking`.
              properties:
                chunk_types:
                  type: array
                  items:
                    type: string
                    enum:
                      - semantic
                      - header
                      - page
                      - recursive
                  description: >-
                    List of chunking strategies to apply (e.g. `["semantic",
                    "header", "page", "recursive"]`).
                chunk_size:
                  type: integer
                  minimum: 1
                  description: Maximum characters per chunk.
            alt_outputs:
              type: object
              description: >-
                Alternate output format options. Each enabled format produces a
                corresponding field under `response.extensions.alt_outputs`.
              properties:
                wlbb:
                  type: boolean
                  default: false
                  description: >-
                    Enable word-level bounding boxes. Runs an additional OCR
                    model to derive bounding boxes for each word. Only applies
                    to PDFs. Results in `response.extensions.alt_outputs.wlbb`.
                return_html:
                  type: boolean
                  default: false
                  description: >-
                    Include an HTML representation of the document. When
                    enabled, `response.markdown` is still present and the HTML
                    is available at `response.extensions.alt_outputs.html`.
                return_xml:
                  type: boolean
                  default: false
                  description: >-
                    Include an XML representation of the document. Results in
                    `response.extensions.alt_outputs.xml`. (Work in progress.)
        storage:
          type: object
          description: >-
            Options for persisting extraction artifacts. When enabled (default),
            artifacts are saved to storage and a database record is created.
          properties:
            enabled:
              type: boolean
              description: >-
                Whether to persist extraction artifacts. Set to false for
                temporary extractions with no storage or database record.
              default: true
            folder_name:
              type: string
              description: >-
                Target folder name to save the extraction to. Creates the folder
                if it doesn't exist.
            folder_id:
              type: string
              format: uuid
              description: >-
                Target folder ID to save the extraction to. Takes precedence
                over folder_name if both are provided.
        async:
          type: boolean
          description: >-
            If true, returns immediately with a job_id for polling via GET
            /job/{jobId}. Otherwise processes synchronously.
          default: false
        refine:
          type: boolean
          default: false
          description: >-
            Run a full-page OCR and formatting correction pass after extraction.
            Improves accuracy on dense layouts, numerical values, and table
            structure. Adds ~1-2s per page. Overridden by `refine_options` if
            both are provided. Requires `model: pulse-ultra-2`.
        refine_options:
          type: object
          description: >-
            Granular refinement targets. Takes precedence over the boolean
            `refine` flag. Requires `model: pulse-ultra-2`.
          properties:
            tables:
              type: boolean
              default: false
              description: >-
                Fix table cell values, structure, and headers against the source
                image.
            text:
              type: boolean
              default: false
              description: >-
                Fix OCR errors, missing or extra content, and numerical accuracy
                (tables untouched).
            formatting:
              type: boolean
              default: false
              description: >-
                Add strikethrough, italic, bold, super/subscript, and LaTeX
                formatting (tables untouched).
        extract_figure:
          type: boolean
          default: false
          description: >-
            Convert charts and data visualizations into HTML `<table>` blocks,
            wrapped in `<figure-table>` tags inside `response.markdown`. Useful
            for financial decks, dashboards, and scientific charts. Requires
            `model: pulse-ultra-2`.
        figure_description:
          type: boolean
          default: false
          description: >-
            Generate a 1-2 paragraph natural-language description of each
            picture, wrapped in `<figure-description>` tags inside
            `response.markdown`. Combines well with `extract_figure`. Requires
            `model: pulse-ultra-2`.
        additional_prompt:
          type: string
          default: ''
          maxLength: 4000
          description: >-
            Extra context injected into the extraction prompt. Use to steer
            extraction toward a specific domain or attention focus. Requires
            `model: pulse-ultra-2`.
        custom_image_prompt:
          type: string
          default: ''
          maxLength: 2000
          description: >-
            Extra context appended to the prompt used by `figure_description`
            and `extract_figure`. Tunes image and chart interpretation for your
            domain. Requires `model: pulse-ultra-2`.
        custom_refine_prompt:
          type: string
          default: ''
          maxLength: 2000
          description: >-
            Extra context appended to the refinement prompt. Only applies when
            `refine: true` or `refine_options` is set. Requires `model:
            pulse-ultra-2`.
        chunking:
          type: string
          deprecated: true
          description: >-
            **Deprecated** -- Use `extensions.chunking.chunk_types` instead.
            Comma-separated list of chunking strategies.
        chunk_size:
          type: integer
          minimum: 1
          deprecated: true
          description: '**Deprecated** -- Use `extensions.chunking.chunk_size` instead.'
        show_images:
          type: boolean
          deprecated: true
          default: false
          description: '**Deprecated** -- Use `figure_processing.show_images` instead.'
        return_html:
          type: boolean
          deprecated: true
          default: false
          description: '**Deprecated** -- Use `extensions.alt_outputs.return_html` instead.'
    ExtractSourceJson:
      type: object
      description: Document source definition for JSON requests.
      properties:
        file_url:
          type: string
          format: uri
          description: Public or pre-signed URL that Pulse will download and extract.
      required:
        - file_url
    ErrorResponse:
      type: object
      properties:
        error:
          type: object
          properties:
            code:
              type: string
              description: Error code (e.g., FILE_001, AUTH_002)
            message:
              type: string
              description: Human-readable error message
            details:
              type: object
              description: Additional error context
  responses:
    BadRequest:
      description: Bad request - Invalid parameters
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    Unauthorized:
      description: Unauthorized - Invalid or missing API key
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    TooManyRequests:
      description: Rate limit exceeded
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    InternalServerError:
      description: Internal server error
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key
      description: API key for authentication

````