Overview
Previously, you could pass astructured_output object directly in your /extract request to get structured data in a single call. This approach has been replaced by a dedicated /schema endpoint that runs after extraction.
Why the change?
| Benefit | Description |
|---|---|
| Re-runnability | Re-apply different schemas to the same extraction without re-processing the document |
| Split-mode schemas | Apply different schemas to different sections of a document using /split → /schema |
| Separation of concerns | Extraction and schema application are independent steps — easier to debug and optimize |
| Async support | The /schema endpoint supports its own async: true flag for long-running schema jobs |
| Cost savings | Only pay for extraction once, then iterate on schemas without additional extraction charges |
Before vs. After
| Before (Deprecated) | After (Recommended) | |
|---|---|---|
| Steps | 1 call to /extract | 2 calls: /extract → /schema |
| Schema param | structured_output on /extract | schema_config on /schema |
| Response field | response.structured_output | response.schema_output |
| Re-run schema | Must re-extract entire document | Call /schema again with same extraction_id |
| Split-mode | Not supported | Supported via /split → /schema |
Migration Steps
Strip the
structured_output, schema, and schema_prompt parameters from your extraction request. Keep all other parameters (pages, figure_processing, extensions, etc.) as-is.The
/extract response includes an extraction_id (when storage is enabled, which is the default). Store this ID — you’ll need it for the schema step.Send a
POST /schema request with the extraction_id and your schema in the schema_config object. The schema format is the same JSON Schema you were using before.Code Examples
Python SDK
TypeScript SDK
curl
With Async Processing
If you were usingstructured_output with async extraction, here’s the updated flow:
Advanced: Split-Mode Schema (New Capability)
With the new flow, you can now split a document into topics and apply different schemas to each section. This was not possible with the oldstructured_output approach.
Python
Response Field Mapping
If you’re parsing the response, here’s how the fields map:| Old field (on /extract response) | New field (on /schema response) | Notes |
|---|---|---|
response.structured_output | response.schema_output.values | Values are now nested under schema_output.values |
response.structured_output.citations | response.schema_output.citations | Same structure, new path |
response.input_schema | N/A | Echo removed; you know what schema you sent |
response.schema_error | Standard HTTP error response | Errors are returned as 4xx/5xx responses |
FAQ
Will structured_output on /extract stop working immediately?
Will structured_output on /extract stop working immediately?
No. The parameter will continue to work for backward compatibility but will be removed in a future version. We recommend migrating as soon as possible.
Is the schema format the same?
Is the schema format the same?
Yes. The JSON Schema format is identical. The only change is where you send it: instead of
structured_output.schema on /extract, you send schema_config.input_schema on /schema. The schema_prompt field is also in the same location within the config object.Does the two-step flow cost more?
Does the two-step flow cost more?
No. Extraction is billed once based on page count. The
/schema endpoint runs on the already-extracted content and does not incur additional extraction charges.Can I use async for the schema step too?
Can I use async for the schema step too?
Yes! Set
async: true on the /schema request to get a job_id and poll for results, just like extraction.What if I don't need structured data — just markdown?
What if I don't need structured data — just markdown?
Then you’re already using the recommended flow. Just call
/extract without any schema parameters and use the markdown from the response.Related
Schema Endpoint
Full reference for the /schema endpoint (single + split mode)
Schema Design Guide
Best practices for writing effective JSON schemas
Pipeline Overview
How extract → split → schema work together
Split Endpoint
Split documents into topics for targeted schema extraction
