Deprecated: Use /extract with async: true instead.
Starts an asynchronous extraction job. The request mirrors the synchronous options but returns immediately with a job identifier that clients can poll for completion status.
/extract endpoint but returns immediately with a job identifier. Use this endpoint for:
/extract_async with /extract and add async: true:
| Field | Type | Description |
|---|---|---|
file | binary | Document file to upload directly (multipart/form-data). |
file_url | string | Public or pre-signed URL that Pulse will download and extract. |
| Field | Type | Default | Description |
|---|---|---|---|
pages | string | - | Page range filter (1-indexed). Supports segments like 1-2 or mixed ranges like 1-2,5. Page 1 is the first page. |
figure_processing | object | - | Settings that control how figures in the document are processed. These affect the markdown output directly and do not produce additional output fields. See Figure Processing. |
extensions | object | - | Settings that enable additional processing or alternate output formats. Each enabled extension produces a corresponding result under response.extensions.*. See Extensions. |
storage | object | - | Options for persisting extraction artifacts. See Storage Options. |
async | boolean | false | If true, returns immediately with a job_id for polling via GET /job/{jobId}. |
structured_output | object | - | ⚠️ Deprecated — Use the /schema endpoint after extraction instead. Still works for backward compatibility. |
figure_processing control how figures (images, charts, diagrams) in the document are processed. These settings affect the markdown output directly — for example, adding descriptive captions to figures or converting charts into markdown tables. They do not create additional output fields in the response.
| Field | Type | Default | Description |
|---|---|---|---|
figure_processing.description | boolean | false | Generate descriptive captions for extracted figures. |
figure_processing.show_images | boolean | false | Embed base64-encoded images inline in figure tags. Increases response size. |
extensions enable additional processing passes or alternate output formats. Each enabled extension produces a corresponding output field under response.extensions.*. For example, enabling extensions.chunking produces response.extensions.chunking, and enabling extensions.alt_outputs.return_html produces response.extensions.alt_outputs.html.
| Field | Type | Default | Description |
|---|---|---|---|
extensions.merge_tables | boolean | false | Merge tables that span multiple pages into a single table. |
extensions.footnote_references | boolean | false | Link footnote markers to their corresponding footnote text. |
extensions.chunking | object | - | Chunking configuration. See below. |
extensions.chunking.chunk_types | string[] | - | List of chunking strategies: semantic, header, page, recursive. |
extensions.chunking.chunk_size | integer | - | Maximum characters per chunk. |
extensions.alt_outputs | object | - | Alternate output formats. See below. |
extensions.alt_outputs.wlbb | boolean | false | Enable word-level bounding boxes (PDF only). Results in response.extensions.alt_outputs.wlbb. |
extensions.alt_outputs.return_html | boolean | false | Include HTML representation. response.markdown is still present; HTML is at response.extensions.alt_outputs.html. |
extensions.alt_outputs.return_xml | boolean | false | Include XML representation (work in progress). |
| Field | Type | Default | Description |
|---|---|---|---|
storage.enabled | boolean | true | Whether to persist extraction artifacts. Set to false for temporary extractions. |
storage.folder_name | string | - | Target folder name to save the extraction to. Creates the folder if it doesn’t exist. |
storage.folder_id | string (uuid) | - | Target folder ID to save the extraction to. Takes precedence over folder_name. |
| Field | Replacement |
|---|---|
extract_figure | No replacement |
figure_description | Use figure_processing.description |
show_images | Use figure_processing.show_images |
chunking | Use extensions.chunking.chunk_types (array instead of comma-separated string) |
chunk_size | Use extensions.chunking.chunk_size |
return_html | Use extensions.alt_outputs.return_html |
structured_output | Use /schema endpoint after extraction. Pass extraction_id + schema_config. Accepts schema, schema_prompt, and effort. |
schema | Use /schema endpoint after extraction |
schema_prompt | Use /schema endpoint with schema_config.schema_prompt |
custom_prompt | No replacement |
thinking | No replacement |
warnings array directing you to the updated field names. See the latest documentation for details.| Field | Type | Description |
|---|---|---|
job_id | string | Unique identifier for the extraction job. Use this to poll for results with the Poll Job endpoint. |
status | string | Initial job status. Typically pending when first submitted. |
queuedAt | string | ISO 8601 timestamp indicating when the job was accepted. |
API key for authentication
Input schema for multipart/form-data requests (file upload or file_url).
Document to upload directly. Required unless file_url is specified.
Public or pre-signed URL that Pulse will download and extract.
Page range filter (1-indexed, where page 1 is the first page). Supports segments such as 1-2 or mixed ranges like 1-2,5.
^[0-9]+(-[0-9]+)?(,[0-9]+(-[0-9]+)?)*$Settings that control how figures in the document are processed. These affect the markdown output directly (e.g. figure descriptions, chart-to-table conversion, image embedding) and do not produce additional output fields in the response.
Settings that enable additional processing passes or alternate output formats. Each enabled extension produces a corresponding output field under response.extensions.*.
Options for persisting extraction artifacts. When enabled (default), artifacts are saved to storage and a database record is created.
If true, returns immediately with a job_id for polling via GET /job/{jobId}. Otherwise processes synchronously.
Deprecated -- Use extensions.chunking.chunk_types instead. Comma-separated list of chunking strategies.
Deprecated -- Use extensions.chunking.chunk_size instead.
x >= 1Deprecated -- No replacement.
Deprecated -- Use figure_processing.description instead.
Deprecated -- Use figure_processing.show_images instead.
Deprecated -- Use extensions.alt_outputs.return_html instead.
Asynchronous extraction job accepted
Acknowledgement returned when a request is submitted for asynchronous processing. Poll GET /job/{job_id} to check status and retrieve results.