> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Tables

> Extract tables from a previously completed extraction. Processes the
extraction's document content and returns structured table data.

Requires the `tables_endpoint` feature flag to be enabled for your
organization.

Set `async: true` to return immediately with a `tables_id` for
polling via `GET /job/{tables_id}`.

To extract tables from many extractions at once, see
[Batch Tables](api:POST/batch/tables) or the
[Batch Processing guide](/batch).

## Overview

<Info>
  **Pipeline Step 2 (terminal)** — Tables requires a prior [extraction](/api-reference/endpoint/extract). This is a terminal step — no further pipeline steps can be chained after it.
</Info>

Extract structured tables from a saved extraction using Pulse's semantic and table-structure algorithms. The `/tables` endpoint detects and reconstructs tables from your document, handling:

* **Span tables** — cells that merge across rows or columns (e.g., "Year Ended December 31" spanning three columns)
* **Multi-level header hierarchies** — nested spans like period → segment → line item
* **Cross-page tables** — tables that continue across page breaks, automatically merged with row-continuity tracking

This is particularly valuable for financial documents (10-Ks, 10-Qs, proxy statements) where span tables encode hierarchy visually rather than explicitly, causing most extraction tools to silently misalign values with the wrong columns.

<Note>
  This endpoint operates on **saved extractions** (created via `/extract` with storage enabled, which is the default).
</Note>

<Note>
  To extract tables from many extractions at once, use [Batch Tables](/api-reference/endpoint/batch-overview#batch-tables).
</Note>

### Async Mode

Set `async: true` to return immediately with a job ID for polling. See [Polling for Results](/api-reference/endpoint/poll) for details.

```json theme={null}
{
  "extraction_id": "abc123-def456",
  "async": true
}
```

***

## Request

### Request Body

| Field           | Type          | Required | Description                                                                                                      |
| --------------- | ------------- | -------- | ---------------------------------------------------------------------------------------------------------------- |
| `extraction_id` | string (uuid) | Yes      | ID of the saved extraction to process                                                                            |
| `tables_config` | object        | No       | Configuration options for table processing                                                                       |
| `async`         | boolean       | No       | If `true`, returns immediately with a `tables_id` for [polling](/api-reference/endpoint/poll). Default: `false`. |

### Tables Config (`tables_config`)

| Field              | Type    | Default  | Description                                                                                                             |
| ------------------ | ------- | -------- | ----------------------------------------------------------------------------------------------------------------------- |
| `merge`            | boolean | `false`  | Merge tables that continue across pages into a single table                                                             |
| `table_format`     | string  | `"html"` | Output format for table content. Use `"html"` for an HTML `<table>` string or `"json"` for structured headers and rows. |
| `charts_to_tables` | boolean | `false`  | Convert figures and charts into tables using LLM processing. Resulting tables have `from_chart: true` in the response.  |

***

## Response

### Synchronous Response (200)

| Field                  | Type          | Description                              |
| ---------------------- | ------------- | ---------------------------------------- |
| `tables_id`            | string (uuid) | Unique identifier for this tables result |
| `tables_output`        | object        | Contains the extracted tables            |
| `tables_output.tables` | array         | List of extracted table objects          |

Each table object:

| Field           | Type             | Description                                                                                        |
| --------------- | ---------------- | -------------------------------------------------------------------------------------------------- |
| `citations`     | array of strings | Bounding box table IDs for the table (e.g., `["tbl-1"]` or `["tbl-1", "tbl-2"]` for merged tables) |
| `table_content` | string           | The table content in HTML format                                                                   |
| `from_chart`    | boolean          | Whether the table was derived from a chart/figure rather than a native table                       |

```json theme={null}
{
  "tables_id": "uuid-123",
  "tables_output": {
    "tables": [
      {
        "citations": ["tbl-1", "tbl-2"],
        "table_content": "<table data-bb-table-id=\"tbl-1\" data-merged-from=\"tbl-1,tbl-2\">...</table>",
        "from_chart": false
      }
    ]
  }
}
```

### Async Response (200)

| Field       | Type          | Description                                        |
| ----------- | ------------- | -------------------------------------------------- |
| `tables_id` | string (uuid) | Job ID for [polling](/api-reference/endpoint/poll) |
| `status`    | string        | `"pending"`                                        |
| `message`   | string        | Human-readable status message                      |

```json theme={null}
{
  "tables_id": "uuid-123",
  "status": "pending",
  "message": "Table processing started. Poll GET /job/{tables_id} for results."
}
```

***

## Example Usage

### Basic Table Extraction

<CodeGroup>
  ```python Python theme={null}
  from pulse import Pulse

  client = Pulse(api_key="YOUR_API_KEY")

  # Step 1: Extract the document
  extract_result = client.extract(
      file=open("10k-filing.pdf", "rb")
  )

  # Step 2: Extract tables
  tables_result = client.tables(
      extraction_id=extract_result.extraction_id
  )

  for table in tables_result.tables_output.tables:
      print(f"Citations: {table.citations}")
      print(f"From chart: {table.from_chart}")
      print(table.table_content)
  ```

  ```typescript TypeScript theme={null}
  import { PulseClient } from "pulse-ts-sdk";
  import * as fs from "fs";

  const client = new PulseClient({
    apiKey: "YOUR_API_KEY",
  });

  // Step 1: Extract the document
  const extractResult = await client.extract({
    file: fs.createReadStream("10k-filing.pdf"),
  });

  // Step 2: Extract tables
  const tablesResult = await client.tables({
    extraction_id: extractResult.extraction_id,
  });

  for (const table of tablesResult.tables_output.tables) {
    console.log("Citations:", table.citations);
    console.log(table.table_content);
  }
  ```

  ```bash curl theme={null}
  # Step 1: Extract the document
  curl -X POST https://api.runpulse.com/extract \
    -H "x-api-key: YOUR_API_KEY" \
    -F "file=@10k-filing.pdf"

  # Response includes extraction_id: "abc123-..."

  # Step 2: Extract tables
  curl -X POST https://api.runpulse.com/tables \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "extraction_id": "abc123-..."
    }'
  ```
</CodeGroup>

### With Cross-Page Table Merging

<CodeGroup>
  ```python Python theme={null}
  tables_result = client.tables(
      extraction_id=extract_result.extraction_id,
      tables_config={
          "merge": True,
          "table_format": "html"
      }
  )
  ```

  ```bash curl theme={null}
  curl -X POST https://api.runpulse.com/tables \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "extraction_id": "abc123-...",
      "tables_config": {
        "merge": true,
        "table_format": "html"
      }
    }'
  ```
</CodeGroup>

### With Chart-to-Table Conversion

Convert figures and charts into structured tables using LLM processing. Chart-derived tables are marked with `from_chart: true` in the response.

<CodeGroup>
  ```python Python theme={null}
  tables_result = client.tables(
      extraction_id=extract_result.extraction_id,
      tables_config={
          "merge": True,
          "charts_to_tables": True
      }
  )

  for table in tables_result.tables_output.tables:
      if table.from_chart:
          print("Chart-derived table:")
      print(table.table_content)
  ```

  ```bash curl theme={null}
  curl -X POST https://api.runpulse.com/tables \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "extraction_id": "abc123-...",
      "tables_config": {
        "merge": true,
        "charts_to_tables": true
      }
    }'
  ```
</CodeGroup>

### Async Processing

```python theme={null}
# Start async table extraction
job = client.tables(
    extraction_id=extract_result.extraction_id,
    tables_config={"merge": True},
    async_=True
)

# Poll for results
result = client.jobs.get_job(job.tables_id)  # Repeat until status is "completed"
```

***

## Error Responses

| Status | Error                | Description                                                |
| ------ | -------------------- | ---------------------------------------------------------- |
| 400    | Invalid request      | Missing required fields or invalid configuration           |
| 401    | Unauthorized         | Invalid or missing API key                                 |
| 404    | Extraction not found | The `extraction_id` doesn't exist or you don't have access |
| 429    | Rate limit exceeded  | Too many requests                                          |
| 500    | Processing error     | Table processing failed                                    |

***

## When to Use Tables vs. Basic Extraction

Basic extraction via `/extract` already returns tables in the markdown output. Use the `/tables` endpoint when you need:

* **Span-aware table parsing** — correct handling of merged cells, multi-level headers, and column/row spans
* **Cross-page table merging** — tables that continue across page breaks reconstructed into a single table
* **Financial document accuracy** — SEC filings, annual reports, and other documents where misaligned columns mean wrong data
* **Dedicated table output** — clean HTML tables with citation tracking, separated from the rest of the document content


## OpenAPI

````yaml POST /tables
openapi: 3.1.0
info:
  title: Pulse API Structure
  version: 0.1.0
  description: >-
    Canonical contract for the Pulse extraction APIs. This specification is the
    single source of truth for shared request/response models that client and
    server packages consume.
servers:
  - url: https://api.runpulse.com
    description: Default Pulse API base URL
security:
  - ApiKey: []
paths:
  /tables:
    post:
      tags:
        - Tables
      summary: Extract tables from a saved extraction
      description: |-
        Extract tables from a previously completed extraction. Processes the
        extraction's document content and returns structured table data.

        Requires the `tables_endpoint` feature flag to be enabled for your
        organization.

        Set `async: true` to return immediately with a `tables_id` for
        polling via `GET /job/{tables_id}`.

        To extract tables from many extractions at once, see
        [Batch Tables](api:POST/batch/tables) or the
        [Batch Processing guide](/batch).
      operationId: extractTables
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/TablesInput'
      responses:
        '200':
          description: Table extraction result (when async=false or omitted).
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TablesResponse'
        '202':
          description: Tables job accepted (when async=true)
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AsyncSubmissionResponse'
        '400':
          description: Invalid request parameters or configuration
        '401':
          description: Authentication failed or missing API key
        '403':
          description: Tables endpoint not enabled for your organization
        '404':
          description: Extraction not found or access denied
        '429':
          description: Rate limit exceeded
        '500':
          description: Internal server error
components:
  schemas:
    TablesInput:
      type: object
      description: Input for the `/tables` endpoint.
      required:
        - extraction_id
      properties:
        extraction_id:
          type: string
          format: uuid
          description: ID of a completed extraction to extract tables from.
        tables_config:
          description: >-
            Table extraction configuration. If omitted, defaults are used
            (`merge: false`, `table_format: "html"`).
          allOf:
            - $ref: '#/components/schemas/TablesConfig'
        async:
          type: boolean
          default: false
          description: >-
            When true, returns immediately with a job ID. Poll `GET
            /job/{tables_id}` for the result.
    TablesResponse:
      type: object
      description: Result of table extraction.
      required:
        - tables_id
        - tables_output
      properties:
        tables_id:
          type: string
          format: uuid
          description: >-
            Persisted tables version ID. Can be used to retrieve the tables
            result later.
        tables_output:
          type: object
          description: The extracted tables data.
          properties:
            tables:
              type: array
              description: >-
                Array of extracted table objects. Each table includes its
                content, citations, and whether it was derived from a chart.
              items:
                type: object
                properties:
                  table_content:
                    description: >-
                      The table content. When `table_format` is `html`
                      (default), this is an HTML string. When `json`, this is an
                      object with `headers` (array of column names) and `rows`
                      (array of objects keyed by header name).
                  citations:
                    type: array
                    description: >-
                      Bounding box table IDs indicating where this table was
                      found (e.g. "tbl-1"). Merged tables list all source IDs.
                    items:
                      type: string
                  from_chart:
                    type: boolean
                    description: >-
                      Whether this table was extracted from a chart or figure
                      rather than a native table.
        credits_used:
          type: number
          format: float
          nullable: true
          description: >-
            Number of credits consumed by this request. Only present when the
            organization has the credit billing system enabled.
        plan_info:
          allOf:
            - $ref: '#/components/schemas/PlanInfo'
          description: >-
            Billing tier and cumulative usage information for the calling org,
            including this tables run.
    AsyncSubmissionResponse:
      type: object
      description: >-
        Acknowledgement returned when a request is submitted for asynchronous
        processing. Poll `GET /job/{job_id}` to check status and retrieve
        results.
      required:
        - job_id
        - status
      properties:
        job_id:
          type: string
          description: Identifier assigned to the asynchronous job.
        status:
          type: string
          description: Initial status reported by the server.
          enum:
            - pending
            - processing
            - completed
            - failed
            - canceled
        message:
          type: string
          description: Human-readable description of the accepted job.
        queuedAt:
          type: string
          format: date-time
          deprecated: true
          description: >-
            **Deprecated** — Timestamp indicating when the job was accepted.
            Retained for backward compatibility. Use `GET /job/{jobId}` for
            timing details.
        credits_used:
          type: number
          format: float
          nullable: true
          description: >-
            Number of credits consumed by this request. Only present when the
            organization has the credit billing system enabled.
    TablesConfig:
      type: object
      description: Configuration for table extraction.
      properties:
        merge:
          type: boolean
          default: false
          description: >-
            When true, adjacent tables that appear to be continuations of each
            other are merged into a single table.
        table_format:
          type: string
          default: html
          description: >-
            Output format for table content. `html` returns an HTML `<table>`
            string. `json` returns a structured object with `headers` (array of
            column names) and `rows` (array of objects keyed by header name).
          enum:
            - html
            - json
        charts_to_tables:
          type: boolean
          default: false
          description: >-
            Convert figures and charts into tables using LLM processing.
            Resulting tables have `from_chart: true` in the response.
    PlanInfo:
      type: object
      description: >-
        Cumulative billing snapshot for the calling organization. Sourced from
        the `pulse-org-stats` aggregate table maintained asynchronously by the
        org-stats Lambda; the in-flight request's contribution is added on top
        so every response reflects post-request state. Returned by every
        endpoint that consumes credits (extract, schema, tables, split, form,
        and their batch / pipeline equivalents).
      properties:
        tier:
          type: string
          description: Billing tier, e.g. `"trial"`, `"growth"`, `"pulse_ultra_2"`.
        total_credits_used:
          type: number
          format: float
          description: >-
            Total credits consumed by the organization to date, including this
            request. The primary billing metric going forward.
        pages_used:
          type: integer
          minimum: 0
          description: >-
            Total pages processed by the organization to date, including this
            request. Kept for backward compatibility with clients that haven't
            migrated to `total_credits_used`.
        note:
          type: string
          description: >-
            Optional human-readable note about billing state for this response
            (e.g. trial credits remaining). Omitted when no note applies.
  securitySchemes:
    ApiKey:
      type: apiKey
      in: header
      name: x-api-key
      x-fern-header:
        name: apiKey
        env: PULSE_API_KEY

````