extraction_id in Schema, Tables, Split, MCP tools, or saved pipelines.
This page explains when to use each setting. For the full endpoint schema, see the
Extract API.
Quick Map
| Need | Read | API field | Try it |
|---|---|---|---|
| Link footnote markers to their explanation text | Footnote References | extensions.footnote_references | Attention Is All You Need |
| Parse workbooks without phantom rows or hidden-sheet surprises | Spreadsheet Processing | spreadsheet.* | Complex Table Document |
| Validate exact word positions for review, redaction, or QA | Word-Level Bounding Boxes | extensions.alt_outputs.wlbb | Bank Statement |
| Prepare text for RAG, retrieval, or agent memory | Chunking | extensions.chunking.* | Attention Is All You Need |
Parameter Guides
Footnote References
Link footnote markers to the text they qualify.
Spreadsheet Processing
Control hidden rows, hidden sheets, raw values, and phantom ranges.
Word-Level Bounding Boxes
Return exact word coordinates for review, QA, and overlays.
Chunking
Prepare extraction output for RAG, search, agents, and review queues.
Good Defaults
For self-serve exploration, start in the Platform and inspect the output tabs before saving a reusable extraction configuration. For production, prefer:async: truefor large files, multi-step workflows, and agent tools.storage.enabled: truewhen later Schema, Tables, or Split steps need to reuse the extraction.- Footnotes and page chunks when citations or audit trails matter.
- Word-level boxes only for workflows that actually render or validate word coordinates.
- Spreadsheet trimming for exported workbooks with inflated used ranges.
Extract API
Full request and response fields.
Build A Platform Pipeline
Configure, run, and reuse processing settings in the Platform.