Skip to main content

Overview

Pipeline Step 2 (terminal) — Tables requires a prior extraction. This is a terminal step — no further pipeline steps can be chained after it.
Extract structured tables from a saved extraction using Pulse’s semantic and table-structure algorithms. The /tables endpoint detects and reconstructs tables from your document, handling:
  • Span tables — cells that merge across rows or columns (e.g., “Year Ended December 31” spanning three columns)
  • Multi-level header hierarchies — nested spans like period → segment → line item
  • Cross-page tables — tables that continue across page breaks, automatically merged with row-continuity tracking
This is particularly valuable for financial documents (10-Ks, 10-Qs, proxy statements) where span tables encode hierarchy visually rather than explicitly, causing most extraction tools to silently misalign values with the wrong columns.
This endpoint operates on saved extractions (created via /extract with storage enabled, which is the default).

Async Mode

Set async: true to return immediately with a job ID for polling. See Polling for Results for details.
{
  "extraction_id": "abc123-def456",
  "async": true
}

Request

Request Body

FieldTypeRequiredDescription
extraction_idstring (uuid)YesID of the saved extraction to process
tables_configobjectNoConfiguration options for table processing
asyncbooleanNoIf true, returns immediately with a tables_id for polling. Default: false.

Tables Config (tables_config)

FieldTypeDefaultDescription
mergebooleanfalseMerge tables that continue across pages into a single table
table_formatstring"html"Output format for table content. Currently only "html" is supported.

Response

Synchronous Response (200)

FieldTypeDescription
tables_idstring (uuid)Unique identifier for this tables result
tables_outputobjectContains the extracted tables
tables_output.tablesarrayList of extracted table objects
Each table object:
FieldTypeDescription
citationsarray of stringsSource references for the table (e.g., page numbers)
table_contentstringThe table content in HTML format
from_chartbooleanWhether the table was derived from a chart/figure rather than a native table
{
  "tables_id": "uuid-123",
  "tables_output": {
    "tables": [
      {
        "citations": ["page 15", "page 16"],
        "table_content": "<table>...</table>",
        "from_chart": false
      }
    ]
  }
}

Async Response (200)

FieldTypeDescription
tables_idstring (uuid)Job ID for polling
statusstring"pending"
messagestringHuman-readable status message
{
  "tables_id": "uuid-123",
  "status": "pending",
  "message": "Table processing started. Poll GET /job/{tables_id} for results."
}

Example Usage

Basic Table Extraction

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

# Step 1: Extract the document
extract_result = client.extract(
    file=open("10k-filing.pdf", "rb")
)

# Step 2: Extract tables
tables_result = client.tables(
    extraction_id=extract_result.extraction_id
)

for table in tables_result.tables_output.tables:
    print(f"Citations: {table.citations}")
    print(f"From chart: {table.from_chart}")
    print(table.table_content)

With Cross-Page Table Merging

tables_result = client.tables(
    extraction_id=extract_result.extraction_id,
    tables_config={
        "merge": True,
        "table_format": "html"
    }
)

Async Processing

# Start async table extraction
job = client.tables(
    extraction_id=extract_result.extraction_id,
    tables_config={"merge": True},
    async_=True
)

# Poll for results
result = client.get_job(job.tables_id)  # Repeat until status is "completed"

Error Responses

StatusErrorDescription
400Invalid requestMissing required fields or invalid configuration
401UnauthorizedInvalid or missing API key
404Extraction not foundThe extraction_id doesn’t exist or you don’t have access
429Rate limit exceededToo many requests
500Processing errorTable processing failed

When to Use Tables vs. Basic Extraction

Basic extraction via /extract already returns tables in the markdown output. Use the /tables endpoint when you need:
  • Span-aware table parsing — correct handling of merged cells, multi-level headers, and column/row spans
  • Cross-page table merging — tables that continue across page breaks reconstructed into a single table
  • Financial document accuracy — SEC filings, annual reports, and other documents where misaligned columns mean wrong data
  • Dedicated table output — clean HTML tables with citation tracking, separated from the rest of the document content