Document Processing Pipelines
A pipeline is a sequence of API calls that process a document from raw file to structured data. You define each step in the Pulse Playground, test it interactively, then deploy it at scale using the generated SDK code.Supported Pipelines
There are four valid pipeline configurations:| Pipeline | Steps | Use Case |
|---|---|---|
| Extract | /extract | Basic content extraction — markdown, tables, figures |
| Extract → Schema | /extract → /schema | Extract + apply a schema to get structured data |
| Extract → Split | /extract → /split | Extract + split document into topic-based page groups |
| Extract → Split → Schema | /extract → /split → /schema | Full pipeline — extract, split by topic, apply per-topic schemas |
How It Works
Step 1: Extract
Every pipeline starts with/extract. This processes your document and returns markdown content, bounding boxes, and optional figures.
Storage is enabled by default. The
extraction_id returned in the response is used to reference the saved extraction in subsequent pipeline steps. If you explicitly disable storage (storage.enabled: false), the extraction won’t be available for split or schema steps.Step 2 (Option A): Schema Extraction
For documents where you need structured data from the entire document, call/schema with the extraction_id:
Step 2 (Option B): Split Document
For multi-section documents (annual reports, contracts, medical records), call/split to identify which pages contain each topic:
Step 3: Schema on Split Results
After splitting, call/schema with the split_id to apply different schemas to each topic’s pages:
Saved Configurations
Each step’s configuration can be saved to a config library for reuse:- Extraction configs — page ranges, figure settings, chunking options
- Split configs — topic definitions with names and descriptions
- Schema configs — JSON schemas with prompts and effort settings
From Playground to Production
The Pulse Playground lets you build and test pipelines interactively:- Configure each step using the visual pipeline builder
- Run the pipeline on a test document to verify results
- Save the pipeline — each step’s config is saved to your library
- Export — click the Show Code button in the top-right corner of the extraction results panel

Deploying at Scale
Once you have the generated code, you can deploy it in production to process documents at scale:async: true on each step and poll for results:
