Extract File
Extracts content from a file at the given URL and processes it with specified chunking options.
Large Document Processing: For documents over 50 pages, we strongly recommend using the /extract_async endpoint instead. The async endpoint prevents timeout issues and provides better handling of extensive processing tasks.
Large Document Response Structure
For documents exceeding 70 pages, the API returns a URL structure instead of the direct response:
Key Points:
- Documents over 70 pages return a URL containing the complete response
- URLs automatically expire after 24 hours
- No changes required to your API request structure
- For documents under 70 pages, the API continues to return results directly
Implementation Tips:
- Check if response contains
"is_url": true
- If true, fetch the complete document data from the provided URL
- Store URLs securely as they contain your processed results
Authorizations
API key for authentication
Headers
API key for authorization
application/json
Body
"PRESIGNED_URL"
true
true
"<chunking method>"
JSON schema for processing
{}
When true, enables enhanced reasoning for higher accuracy with slightly increased processing time but superior extraction quality
false
Custom instructions, notes, or specifications for domain-specific extraction needs
""
Precise page selection for targeted extraction. Supports single pages (1,3,5), page ranges (1-5,7-9), or combinations (1,3-5,7,9-11)
""
Response
Full document content in markdown format
Document broken down into smaller, processable chunks
Extracted structured data based on provided schema