Skip to main content
Processing parameters sit on Extract. Configure them before the extraction runs, then reuse the resulting extraction_id in Schema, Tables, Split, MCP tools, or saved pipelines.
This page explains when to use each setting. For the full endpoint schema, see the Extract API.

Quick Map

NeedReadAPI fieldTry it
Link footnote markers to their explanation textFootnote Referencesextensions.footnote_referencesAttention Is All You Need
Parse workbooks without phantom rows or hidden-sheet surprisesSpreadsheet Processingspreadsheet.*Complex Table Document
Validate exact word positions for review, redaction, or QAWord-Level Bounding Boxesextensions.alt_outputs.wlbbBank Statement
Prepare text for RAG, retrieval, or agent memoryChunkingextensions.chunking.*Attention Is All You Need
Do not enable every parameter by default. Add the settings your downstream workflow actually uses, especially for high-volume or regulated pipelines where result size, latency, and auditability matter.

Parameter Guides

Footnote References

Link footnote markers to the text they qualify.

Spreadsheet Processing

Control hidden rows, hidden sheets, raw values, and phantom ranges.

Word-Level Bounding Boxes

Return exact word coordinates for review, QA, and overlays.

Chunking

Prepare extraction output for RAG, search, agents, and review queues.

Good Defaults

For self-serve exploration, start in the Platform and inspect the output tabs before saving a reusable extraction configuration. For production, prefer:
  • async: true for large files, multi-step workflows, and agent tools.
  • storage.enabled: true when later Schema, Tables, or Split steps need to reuse the extraction.
  • Footnotes and page chunks when citations or audit trails matter.
  • Word-level boxes only for workflows that actually render or validate word coordinates.
  • Spreadsheet trimming for exported workbooks with inflated used ranges.

Extract API

Full request and response fields.

Build A Platform Pipeline

Configure, run, and reuse processing settings in the Platform.