Core Objects
| Object | Created by | Used for |
|---|---|---|
job_id | Any async request | Polling job status and retrieving async results. |
extraction_id | /extract with storage enabled | Reusing parsed content for Schema, Split, Tables, reruns, and saved results. |
split_id | /split | Applying per-topic schemas or table extraction to split page groups. |
schema_id | /schema | Tracking a structured output version and downloading filled Excel templates when used. |
tables_id | /tables | Tracking table extraction output, especially async table jobs. |
| Config IDs | Saved presets | Reusing Extract, Split, Schema, and Tables settings without inlining JSON. |
Storage Defaults
Storage is enabled by default for normal workflows because downstream steps need saved extraction artifacts. If storage is disabled, Pulse can still return the immediate Extract response, but later steps may not be able to reuse that extraction.Async Lifecycle
For longer documents or heavier steps, setasync: true.
- The request returns quickly with a
job_id. - Your app polls
GET /job/{jobId}or waits for a webhook. - When status is
completed, the job result contains the same output the sync call would have returned. - Large results may include a download URL instead of embedding the entire payload.
Platform Lifecycle
In the Platform, the same lifecycle appears as a visual pipeline:- Upload or select a document.
- Configure Extract and optional downstream steps.
- Run the pipeline.
- Inspect outputs in Markdown, Tables, Split, and Schema views.
- Save step presets or a full pipeline preset.
- Use Show Code to reproduce the pipeline from the SDK.
Production Lifecycle
A mature production workflow usually looks like this:Related
Chaining Steps
Learn the exact ID handoffs between Extract, Split, Schema, Tables, and Jobs.
Async Processing
Decide when to run jobs asynchronously and how to poll safely.