Skip to main content

Overview

When extracting content with layout information, Pulse API returns bounding box coordinates for text, tables, and images. This spatial data enables precise document understanding and region-based extraction.

Bounding Box Format

Bounding boxes are returned as normalized coordinates (0-1 range) in an 8-point format:
[x1, y1, x2, y2, x3, y3, x4, y4]
Where:
  • (x1, y1) = Top-left corner
  • (x2, y2) = Top-right corner
  • (x3, y3) = Bottom-right corner
  • (x4, y4) = Bottom-left corner
Coordinates are normalized to 0-1 range, making them resolution-independent. To convert to pixels, multiply by the page width/height.

Response Structure

The extraction response includes the following arrays:
{
  "Footer": [],
  "Header": [],
  "Images": [],
  "Tables": [],
  "Text": [],
  "Title": [],
  "Page Number": []
}
Not all fields will be present in every response. The API only includes arrays for elements that were detected in the document.

Example Response

Here’s a real example of bounding box data:
{
  "Images": [
    {
      "bounding_box": [0.01014957264957265, 0.01744186046511628, 0.09241452991452992, 0.01744186046511628, 0.09241452991452992, 0.10719476744186046, 0.01014957264957265, 0.10719476744186046],
      "confidence": "N/A",
      "page_number": 1
    },
    {
      "bounding_box": [0.2013888888888889, 0.005813953488372093, 0.32425213675213677, 0.005087209302325582, 0.3247863247863248, 0.08829941860465117, 0.20192307692307693, 0.08866279069767442],
      "confidence": "N/A",
      "page_number": 1
    }
  ],
  "Tables": [],
  "Text": [
    {
      "average_word_confidence": 0.973,
      "bounding_box": [0.026709401709401708, 0.0872093023255814, 0.06891025641025642, 0.07885174418604651, 0.07425213675213675, 0.09084302325581395, 0.03205128205128205, 0.09956395348837209],
      "content": "0a-NCRI",
      "page_number": 1
    },
    {
      "average_word_confidence": 0.994,
      "bounding_box": [0.2548076923076923, 0.1587936046511628, 0.3680555555555556, 0.16388081395348839, 0.36698717948717946, 0.17732558139534885, 0.2537393162393162, 0.17223837209302326],
      "content": "0a-Dalia Kundu",
      "page_number": 1
    }
  ],
  "Title": [
    {
      "average_word_confidence": 0.995,
      "bounding_box": [0.21955128205128205, 0.1224563953488372, 0.4577991452991453, 0.1348110465116279, 0.45566239316239315, 0.1537063953488372, 0.21741452991452992, 0.14171511627906977],
      "content": "0a-Doctor Prescription",
      "page_number": 1
    }
  ]
}

Field Descriptions

Text Array

Each text element contains:
  • content: The extracted text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the text appears
  • average_word_confidence: OCR confidence score (0-1)

Title Array

Each title element contains:
  • content: The title text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the title appears
  • average_word_confidence: OCR confidence score (0-1)

Header Array

Each header element contains:
  • content: The header text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the header appears
  • average_word_confidence: OCR confidence score (0-1)
Each footer element contains:
  • content: The footer text
  • bounding_box: 8-point coordinate array
  • page_number: Page where the footer appears
  • average_word_confidence: OCR confidence score (0-1)

Images Array

Each image element contains:
  • bounding_box: 8-point coordinate array
  • page_number: Page where the image appears
  • confidence: Detection confidence (if available)

Tables Array

Each table element contains:
  • bounding_box: 8-point coordinate array
  • page_number: Page where the table appears
  • content: Table content (in HTML format)

Page Number Array

Each page number element contains:
  • content: The page number text
  • bounding_box: 8-point coordinate array
  • page_number: Page where it appears
  • average_word_confidence: OCR confidence score (0-1)
Tables are extracted and returned in HTML format, preserving the structure and making it easy to parse or display.

Converting Coordinates

To convert normalized coordinates to pixel coordinates:
def normalize_to_pixels(bbox, page_width, page_height):
    """Convert normalized bounding box to pixel coordinates."""
    return [
        bbox[0] * page_width,   # x1
        bbox[1] * page_height,  # y1
        bbox[2] * page_width,   # x2
        bbox[3] * page_height,  # y2
        bbox[4] * page_width,   # x3
        bbox[5] * page_height,  # y3
        bbox[6] * page_width,   # x4
        bbox[7] * page_height   # y4
    ]

# Example: Convert for a standard letter-size page at 72 DPI
page_width = 612  # 8.5 inches * 72 DPI
page_height = 792  # 11 inches * 72 DPI

normalized_bbox = [0.1, 0.1, 0.3, 0.1, 0.3, 0.15, 0.1, 0.15]
pixel_bbox = normalize_to_pixels(normalized_bbox, page_width, page_height)

Next Steps