Skip to main content

Overview

When extracting content with layout information, Pulse API returns bounding box coordinates for text, tables, and images. This spatial data enables precise document understanding and region-based extraction.

Bounding Box Format

Bounding boxes are returned as normalized coordinates (0-1 range) in an 8-point format:
[x1, y1, x2, y2, x3, y3, x4, y4]
Where:
  • (x1, y1) = Top-left corner
  • (x2, y2) = Top-right corner
  • (x3, y3) = Bottom-right corner
  • (x4, y4) = Bottom-left corner
Coordinates are normalized to 0-1 range, making them resolution-independent. To convert to pixels, multiply by the page width/height.

Response Structure

The bounding_boxes object in the extraction response contains:
{
  "bounding_boxes": {
    "Footer": [],
    "Header": [],
    "Images": [],
    "Tables": [],
    "Text": [],
    "Title": [],
    "Page Number": [],
    "markdown_with_ids": "..."
  }
}
Not all fields will be present in every response. The API only includes arrays for elements that were detected in the document.

Markdown Fields

FieldLocationDescription
markdownTop-level responseClean markdown content without any ID attributes
markdown_with_idsInside bounding_boxesMarkdown with data-bb-* ID attributes that link text to bounding box elements
Use bounding_boxes.markdown_with_ids when you need to correlate text positions with bounding boxes. Use the top-level markdown for clean content display or export.

Example Response

Here’s a real example of the bounding_boxes object from a workbook with an embedded chart, with figure_processing.show_images: true:
{
  "Images": [
    {
      "id": "excel_image_1_1",
      "visual_type": "chart",
      "page_number": 1,
      "bounding_box": [],
      "image_url": "https://api.runpulse.com/results/13e3e75f-a89a-4d33-a391-e1a17127ab38/images/excel_image_1_1.png",
      "sheet_name": "Charts",
      "excel_range": "D2",
      "chart_type": "BarChart",
      "chart_title": "Revenue",
      "source_ranges": ["'Charts'!$A$2:$A$5", "'Charts'!$B$2:$B$5"],
      "description": "Bar chart showing revenue by quarter."
    }
  ],
  "Tables": [],
  "Text": [
    {
      "id": "txt-2",
      "content": "0a-NCRI",
      "original_content": "NCRI",
      "bounding_box": [0.0267, 0.0872, 0.0689, 0.0789, 0.0743, 0.0908, 0.0321, 0.0996],
      "page_number": 1,
      "average_word_confidence": 0.973
    }
  ],
  "Title": [
    {
      "id": "txt-1",
      "content": "0a-Doctor Prescription",
      "original_content": "Doctor Prescription",
      "bounding_box": [0.2196, 0.1225, 0.4578, 0.1348, 0.4557, 0.1537, 0.2174, 0.1417],
      "page_number": 1,
      "average_word_confidence": 0.995
    }
  ]
}

Field Descriptions

Text Array

Each text element contains:
  • id: Unique identifier (e.g., txt-1) that links to markdown_with_ids via data-bb-text-id
  • content: The extracted text with prefix (e.g., 0a-NCRI)
  • original_content: The clean extracted text without prefix
  • bounding_box: 8-point coordinate array (may be null for some document types)
  • page_number: Page where the text appears
  • average_word_confidence: OCR confidence score (0-1)

Title Array

Each title element contains:
  • id: Unique identifier linking to markdown
  • content: The title text with prefix
  • original_content: The clean title text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the title appears
  • average_word_confidence: OCR confidence score (0-1)

Header Array

Each header element contains:
  • id: Unique identifier linking to markdown
  • content: The header text with prefix
  • original_content: The clean header text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the header appears
  • average_word_confidence: OCR confidence score (0-1)
Each footer element contains:
  • id: Unique identifier linking to markdown
  • content: The footer text with prefix
  • original_content: The clean footer text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the footer appears
  • average_word_confidence: OCR confidence score (0-1)

Images Array

Each image element represents a detected chart or embedded image. For PDFs and image inputs, entries are populated when figure detection runs. For spreadsheets, entries are populated for embedded charts and images directly read from the workbook.
FieldWhen populatedDescription
idalwaysStable visual identifier (e.g. excel_image_1_1, fig-3). Joins to the data-bb-image-id attribute in markdown_with_ids.
visual_typealways"chart" (data visualization) or "image" (non-chart embedded/detected visual).
page_numberalways1-indexed page or sheet number.
bounding_boxPDFs/images8-point coordinate polygon. Empty array for spreadsheets — use excel_range instead.
image_urlwhen figure_processing.show_images: truePulse-hosted URL for the visual image bytes. Fetch via results.getImage or any HTTP client with your API key.
descriptionwhen figure_processing.description: trueLLM-generated 1–2 sentence caption.
contentusually for spreadsheetsShort caption (e.g. Chart: Revenue).
sheet_namespreadsheetsWorkbook sheet the visual lives on.
sheet_indexspreadsheetsParsed sheet index after hidden-sheet filtering.
workbook_sheet_indexspreadsheetsOriginal workbook sheet index.
excel_rangespreadsheetsAnchor cell or covered cell range (e.g. D2:K18).
chart_typespreadsheet chartsChart class name (e.g. BarChart, LineChart, PieChart).
chart_titlespreadsheet chartsDetected chart title text.
source_rangesspreadsheet chartsCell ranges feeding the chart (e.g. ["Charts!$B$1:$B$3"]).
classificationoptional{confidence, model, error} when classification ran.
render_erroroptionalNon-fatal rendering error for spreadsheet visuals. When set, the entry is still returned but image_url may be omitted.
description_erroroptionalNon-fatal description-generation error.

Fetching Visual Image Bytes

When you set figure_processing.show_images: true on /extract, every chart/image entry comes back with an image_url pointing at GET /results/{jobId}/images/{filename}. Fetch it with your API key to get the raw PNG/JPEG bytes:
import re
from pulse import Pulse
from pulse.types import ExtractRequestFigureProcessing

client = Pulse(api_key="YOUR_API_KEY")
response = client.extract(
    file=open("financials.xlsx", "rb"),
    figure_processing=ExtractRequestFigureProcessing(show_images=True),
)

for img in response.bounding_boxes.images or []:
    m = re.search(r"/results/([^/]+)/images/([^/?#]+)", img.image_url)
    job_id, filename = m.group(1), m.group(2)
    chunks = list(client.results.get_image(job_id=job_id, filename=filename))
    with open(filename, "wb") as f:
        f.write(b"".join(chunks))
See Get Result Image for the full auth contract — visual image fetches always require same-org x-api-key authentication; there is no anonymous access.

Tables Array

Each table element contains:
  • id: Unique identifier (e.g., tbl-1)
  • bounding_box: 8-point coordinate array
  • page_number: Page where the table appears
  • content: Table content (in HTML format)

Page Number Array

Each page number element contains:
  • id: Unique identifier
  • content: The page number text
  • original_content: The clean page number text
  • bounding_box: 8-point coordinate array
  • page_number: Page where it appears
  • average_word_confidence: OCR confidence score (0-1)
Tables are extracted and returned in HTML format, preserving the structure and making it easy to parse or display.
The id field allows you to link bounding box elements to specific locations in the markdown_with_ids field via data-bb-text-id attributes.

Footnote References

When you enable extensions.footnote_references in your extract request, the response includes an extensions.footnoteReferences array that uses bounding box IDs to link footnote markers to their in-text references. Each entry contains:
  • symbol — the footnote marker (e.g. *, , , 1)
  • footnoteTextId — the id of the footnote explanation, typically found in the Footer array
  • referenceTextIds — an array of id values from the Text, Title, or Header arrays identifying body paragraphs that contain the marker
{
  "extensions": {
    "footnoteReferences": [
      {
        "symbol": "*",
        "footnoteTextId": "txt-42",
        "referenceTextIds": ["txt-5", "txt-12"]
      }
    ]
  }
}
Use footnoteTextId to look up the footnote’s position and content in bounding_boxes.Footer (or bounding_boxes.Text), and each entry in referenceTextIds to locate the citing paragraphs in bounding_boxes.Text, bounding_boxes.Title, or bounding_boxes.Header. This allows you to spatially highlight both the footnote and every place in the document that references it.
Footnote references are only available for PDF documents. See the Extract endpoint for usage examples.

Converting Coordinates

To convert normalized coordinates to pixel coordinates:
def normalize_to_pixels(bbox, page_width, page_height):
    """Convert normalized bounding box to pixel coordinates."""
    return [
        bbox[0] * page_width,   # x1
        bbox[1] * page_height,  # y1
        bbox[2] * page_width,   # x2
        bbox[3] * page_height,  # y2
        bbox[4] * page_width,   # x3
        bbox[5] * page_height,  # y3
        bbox[6] * page_width,   # x4
        bbox[7] * page_height   # y4
    ]

# Example: Convert for a standard letter-size page at 72 DPI
page_width = 612  # 8.5 inches * 72 DPI
page_height = 792  # 11 inches * 72 DPI

normalized_bbox = [0.1, 0.1, 0.3, 0.1, 0.3, 0.15, 0.1, 0.15]
pixel_bbox = normalize_to_pixels(normalized_bbox, page_width, page_height)

Next Steps

Extract Endpoint

Enable bounding box extraction

Structured Output

Combine with structured data