The Golden Path
ID Handoffs
| You have | You can call next | Why |
|---|---|---|
job_id | GET /job/{jobId} | Check async status and retrieve the result. |
extraction_id | /schema | Extract structured data from the whole document. |
extraction_id | /tables | Extract table structure from the document. |
extraction_id | /split | Assign pages to topics. |
split_id | /schema | Apply different schemas to topic page groups. |
split_id | /tables | Extract tables scoped to split topics. |
schema_id | /schema/{schemaId}/excel | Download a filled Excel template when schema template mode was used. |
Extract -> Schema
Extract -> Tables
Extract -> Split -> Schema
Async Chaining
When you setasync: true, wait for completion before passing the result to the next step.
Common Mistakes
Passing job_id where extraction_id is required
Passing job_id where extraction_id is required
A
job_id is for polling. After the job completes, read the completed result and pass its extraction_id into /schema, /split, or /tables.Disabling storage before downstream steps
Disabling storage before downstream steps
Downstream steps need saved extraction artifacts. Keep storage enabled when you plan to chain.
Using Split when a page range is enough
Using Split when a page range is enough
If the target pages are always known, pass
pages or a table page_range. Use Split when topic location changes by document.Using Schema for a table-first workflow
Using Schema for a table-first workflow
Schema is great for named fields. Use Tables when preserving row and column relationships is the product.
Related
Pipeline Overview
See supported API pipeline shapes.
Moving from Platform to Production
Generate chained SDK calls from the Playground.