Overview
Pipeline Step 2 (terminal) — Tables requires a prior extraction. This is a terminal step — no further pipeline steps can be chained after it.
Extract structured tables from a saved extraction using Pulse’s semantic and table-structure algorithms. The /tables endpoint detects and reconstructs tables from your document, handling:
- Span tables — cells that merge across rows or columns (e.g., “Year Ended December 31” spanning three columns)
- Multi-level header hierarchies — nested spans like period → segment → line item
- Cross-page tables — tables that continue across page breaks, automatically merged with row-continuity tracking
This is particularly valuable for financial documents (10-Ks, 10-Qs, proxy statements) where span tables encode hierarchy visually rather than explicitly, causing most extraction tools to silently misalign values with the wrong columns.
This endpoint operates on saved extractions (created via /extract with storage enabled, which is the default).
To extract tables from many extractions at once, use Batch Tables.
Async Mode
Set async: true to return immediately with a job ID for polling. See Polling for Results for details.
{
"extraction_id": "abc123-def456",
"async": true
}
Request
Request Body
| Field | Type | Required | Description |
|---|
extraction_id | string (uuid) | Yes | ID of the saved extraction to process |
tables_config | object | No | Configuration options for table processing |
async | boolean | No | If true, returns immediately with a tables_id for polling. Default: false. |
Tables Config (tables_config)
| Field | Type | Default | Description |
|---|
merge | boolean | false | Merge tables that continue across pages into a single table |
table_format | string | "html" | Output format for table content. Currently only "html" is supported. |
charts_to_tables | boolean | false | Convert figures and charts into tables using LLM processing. Resulting tables have from_chart: true in the response. |
Response
Synchronous Response (200)
| Field | Type | Description |
|---|
tables_id | string (uuid) | Unique identifier for this tables result |
tables_output | object | Contains the extracted tables |
tables_output.tables | array | List of extracted table objects |
Each table object:
| Field | Type | Description |
|---|
citations | array of strings | Bounding box table IDs for the table (e.g., ["tbl-1"] or ["tbl-1", "tbl-2"] for merged tables) |
table_content | string | The table content in HTML format |
from_chart | boolean | Whether the table was derived from a chart/figure rather than a native table |
{
"tables_id": "uuid-123",
"tables_output": {
"tables": [
{
"citations": ["tbl-1", "tbl-2"],
"table_content": "<table data-bb-table-id=\"tbl-1\" data-merged-from=\"tbl-1,tbl-2\">...</table>",
"from_chart": false
}
]
}
}
Async Response (200)
| Field | Type | Description |
|---|
tables_id | string (uuid) | Job ID for polling |
status | string | "pending" |
message | string | Human-readable status message |
{
"tables_id": "uuid-123",
"status": "pending",
"message": "Table processing started. Poll GET /job/{tables_id} for results."
}
Example Usage
from pulse import Pulse
client = Pulse(api_key="YOUR_API_KEY")
# Step 1: Extract the document
extract_result = client.extract(
file=open("10k-filing.pdf", "rb")
)
# Step 2: Extract tables
tables_result = client.tables(
extraction_id=extract_result.extraction_id
)
for table in tables_result.tables_output.tables:
print(f"Citations: {table.citations}")
print(f"From chart: {table.from_chart}")
print(table.table_content)
With Cross-Page Table Merging
tables_result = client.tables(
extraction_id=extract_result.extraction_id,
tables_config={
"merge": True,
"table_format": "html"
}
)
With Chart-to-Table Conversion
Convert figures and charts into structured tables using LLM processing. Chart-derived tables are marked with from_chart: true in the response.
tables_result = client.tables(
extraction_id=extract_result.extraction_id,
tables_config={
"merge": True,
"charts_to_tables": True
}
)
for table in tables_result.tables_output.tables:
if table.from_chart:
print("Chart-derived table:")
print(table.table_content)
Async Processing
# Start async table extraction
job = client.tables(
extraction_id=extract_result.extraction_id,
tables_config={"merge": True},
async_=True
)
# Poll for results
result = client.jobs.get_job(job.tables_id) # Repeat until status is "completed"
Error Responses
| Status | Error | Description |
|---|
| 400 | Invalid request | Missing required fields or invalid configuration |
| 401 | Unauthorized | Invalid or missing API key |
| 404 | Extraction not found | The extraction_id doesn’t exist or you don’t have access |
| 429 | Rate limit exceeded | Too many requests |
| 500 | Processing error | Table processing failed |
Basic extraction via /extract already returns tables in the markdown output. Use the /tables endpoint when you need:
- Span-aware table parsing — correct handling of merged cells, multi-level headers, and column/row spans
- Cross-page table merging — tables that continue across page breaks reconstructed into a single table
- Financial document accuracy — SEC filings, annual reports, and other documents where misaligned columns mean wrong data
- Dedicated table output — clean HTML tables with citation tracking, separated from the rest of the document content