Skip to main content

Overview

When extracting content with layout information, Pulse API returns bounding box coordinates for text, tables, and images. This spatial data enables precise document understanding and region-based extraction.

Bounding Box Format

Bounding boxes are returned as normalized coordinates (0-1 range) in an 8-point format:
[x1, y1, x2, y2, x3, y3, x4, y4]
Where:
  • (x1, y1) = Top-left corner
  • (x2, y2) = Top-right corner
  • (x3, y3) = Bottom-right corner
  • (x4, y4) = Bottom-left corner
Coordinates are normalized to 0-1 range, making them resolution-independent. To convert to pixels, multiply by the page width/height.

Response Structure

The bounding_boxes object in the extraction response contains:
{
  "bounding_boxes": {
    "Footer": [],
    "Header": [],
    "Images": [],
    "Tables": [],
    "Text": [],
    "Title": [],
    "Page Number": [],
    "markdown_with_ids": "..."
  }
}
Not all fields will be present in every response. The API only includes arrays for elements that were detected in the document.

Markdown Fields

FieldLocationDescription
markdownTop-level responseClean markdown content without any ID attributes
markdown_with_idsInside bounding_boxesMarkdown with data-bb-* ID attributes that link text to bounding box elements
Use bounding_boxes.markdown_with_ids when you need to correlate text positions with bounding boxes. Use the top-level markdown for clean content display or export.

Example Response

Here’s a real example of the bounding_boxes object:
{
  "Images": [
    {
      "id": "img-1",
      "bounding_box": [0.0101, 0.0174, 0.0924, 0.0174, 0.0924, 0.1072, 0.0101, 0.1072],
      "confidence": "N/A",
      "page_number": 1
    }
  ],
  "Tables": [],
  "Text": [
    {
      "id": "txt-2",
      "content": "0a-NCRI",
      "original_content": "NCRI",
      "bounding_box": [0.0267, 0.0872, 0.0689, 0.0789, 0.0743, 0.0908, 0.0321, 0.0996],
      "page_number": 1,
      "average_word_confidence": 0.973
    }
  ],
  "Title": [
    {
      "id": "txt-1",
      "content": "0a-Doctor Prescription",
      "original_content": "Doctor Prescription",
      "bounding_box": [0.2196, 0.1225, 0.4578, 0.1348, 0.4557, 0.1537, 0.2174, 0.1417],
      "page_number": 1,
      "average_word_confidence": 0.995
    }
  ]
}

Field Descriptions

Text Array

Each text element contains:
  • id: Unique identifier (e.g., txt-1) that links to markdown_with_ids via data-bb-text-id
  • content: The extracted text with prefix (e.g., 0a-NCRI)
  • original_content: The clean extracted text without prefix
  • bounding_box: 8-point coordinate array (may be null for some document types)
  • page_number: Page where the text appears
  • average_word_confidence: OCR confidence score (0-1)

Title Array

Each title element contains:
  • id: Unique identifier linking to markdown
  • content: The title text with prefix
  • original_content: The clean title text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the title appears
  • average_word_confidence: OCR confidence score (0-1)

Header Array

Each header element contains:
  • id: Unique identifier linking to markdown
  • content: The header text with prefix
  • original_content: The clean header text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the header appears
  • average_word_confidence: OCR confidence score (0-1)
Each footer element contains:
  • id: Unique identifier linking to markdown
  • content: The footer text with prefix
  • original_content: The clean footer text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the footer appears
  • average_word_confidence: OCR confidence score (0-1)

Images Array

Each image element contains:
  • id: Unique identifier (e.g., img-1)
  • bounding_box: 8-point coordinate array
  • page_number: Page where the image appears
  • confidence: Detection confidence (if available)

Tables Array

Each table element contains:
  • id: Unique identifier (e.g., tbl-1)
  • bounding_box: 8-point coordinate array
  • page_number: Page where the table appears
  • content: Table content (in HTML format)

Page Number Array

Each page number element contains:
  • id: Unique identifier
  • content: The page number text
  • original_content: The clean page number text
  • bounding_box: 8-point coordinate array
  • page_number: Page where it appears
  • average_word_confidence: OCR confidence score (0-1)
Tables are extracted and returned in HTML format, preserving the structure and making it easy to parse or display.
The id field allows you to link bounding box elements to specific locations in the markdown_with_ids field via data-bb-text-id attributes.

Converting Coordinates

To convert normalized coordinates to pixel coordinates:
def normalize_to_pixels(bbox, page_width, page_height):
    """Convert normalized bounding box to pixel coordinates."""
    return [
        bbox[0] * page_width,   # x1
        bbox[1] * page_height,  # y1
        bbox[2] * page_width,   # x2
        bbox[3] * page_height,  # y2
        bbox[4] * page_width,   # x3
        bbox[5] * page_height,  # y3
        bbox[6] * page_width,   # x4
        bbox[7] * page_height   # y4
    ]

# Example: Convert for a standard letter-size page at 72 DPI
page_width = 612  # 8.5 inches * 72 DPI
page_height = 792  # 11 inches * 72 DPI

normalized_bbox = [0.1, 0.1, 0.3, 0.1, 0.3, 0.15, 0.1, 0.15]
pixel_bbox = normalize_to_pixels(normalized_bbox, page_width, page_height)

Next Steps