The primary endpoint for the Pulse API. Parses uploaded documents or remote file URLs and returns rich markdown content with optional structured data extraction based on user-provided schemas and extraction options.
| Field | Type | Description |
|---|---|---|
file | binary | Document file to upload directly (multipart/form-data). |
file_url | string | Public or pre-signed URL that Pulse will download and extract. |
| Field | Type | Default | Description |
|---|---|---|---|
structured_output | object | - | Recommended method for schema-guided extraction. Contains schema (JSON schema) and optional schema_prompt (natural language instructions). |
pages | string | - | Page range filter (1-indexed). Supports segments like 1-2 or mixed ranges like 1-2,5. Page 1 is the first page. |
chunking | string | - | Comma-separated list of chunking strategies (e.g., semantic,header,page,recursive). |
chunk_size | integer | - | Maximum characters per chunk when chunking is enabled. |
extract_figure | boolean | false | Enable figure extraction in results. |
figure_description | boolean | false | Generate descriptive captions for extracted figures. |
return_html | boolean | false | Include HTML representation alongside markdown in the response. |
storage | object | - | Options for persisting extraction artifacts. See Storage Options. |
| Field | Type | Default | Description |
|---|---|---|---|
storage.enabled | boolean | true | Whether to persist extraction artifacts. Set to false for temporary extractions. |
storage.folder_name | string | - | Target folder name to save the extraction to. Creates the folder if it doesn’t exist. |
| Field | Replacement |
|---|---|
schema | Use structured_output.schema instead |
schema_prompt | Use structured_output.schema_prompt instead |
experimental_schema | Use structured_output.schema instead |
custom_prompt | No replacement |
thinking | No replacement |
| Field | Type | Description |
|---|---|---|
markdown | string | Clean markdown content extracted from the document. |
page_count | integer | Total number of pages processed. |
job_id | string | Unique identifier for the extraction job. |
plan-info | object | Billing information including pages used and plan tier. |
bounding_boxes | object | Detailed bounding box data for document elements. See Bounding Boxes for details. |
extraction_url | string | URL to view the extraction in the Pulse Platform (when storage is enabled). |
html | string | HTML representation of the document (when return_html is true). |
structured_output | object | Extracted data matching your schema (when structured_output is provided). |
chunks | array | Document chunks (when chunking is enabled). |
figures | array | Extracted figures (when extract_figure is true). |
| Field | Type | Description |
|---|---|---|
is_url | boolean | Always true for large document responses. Use this to detect URL-based responses. |
url | string | Pre-signed S3 URL containing the complete extraction results. Expires after 24 hours. |
plan-info | object | Billing information including pages used and plan tier. |
storage.enabled is true and retrieve results from your extraction library.API key for authentication
Input schema for multipart/form-data requests (file upload or file_url).
Document to upload directly. Required unless file_url is specified.
Public or pre-signed URL that Pulse will download and extract.
Recommended method for schema-guided extraction. Contains the schema and optional prompt in a single object.
Page range filter (1-indexed, where page 1 is the first page). Supports segments such as 1-2 or mixed ranges like 1-2,5.
Comma-separated list of chunking strategies to apply (for example semantic,header,page,recursive).
Override for maximum characters per chunk when chunking is enabled.
x >= 1Toggle to enable figure extraction in results.
Toggle to generate descriptive captions for extracted figures.
Whether to include HTML representation alongside markdown in the response.
Options for persisting extraction artifacts. When enabled (default), artifacts are saved to storage and a database record is created.
⚠️ DEPRECATED - Use structured_output.schema instead. JSON schema describing structured data to extract.
⚠️ DEPRECATED - Use structured_output.schema_prompt instead. Natural language prompt for schema-guided extraction.
⚠️ DEPRECATED - Use structured_output.schema instead. Experimental schema definition.
⚠️ DEPRECATED - No replacement. Custom instructions that augment the default extraction behaviour.
⚠️ DEPRECATED - No replacement. Enables expanded rationale output for debugging.
Synchronous extraction result
High-level structure returned by the synchronous extract API.
Primary markdown content extracted from the document.
Optional HTML representation when return_html is true.
Identifier assigned to the extraction job.
Non-fatal warnings generated during extraction.
Additional metadata supplied by the backend.
Extracted structured data if schema was provided.