Large Document Processing: For documents over 50 pages, we strongly recommend using the /extract_async endpoint instead. The async endpoint prevents timeout issues and provides better handling of extensive processing tasks.
Large Document Response Structure
For documents exceeding 70 pages, the API returns a URL structure instead of the direct response:Key Points:
- Documents over 70 pages return a URL containing the complete response
- URLs automatically expire after 24 hours
- No changes required to your API request structure
- For documents under 70 pages, the API continues to return results directly
Implementation Tips:
- Check if response contains
"is_url": true
- If true, fetch the complete document data from the provided URL
- Store URLs securely as they contain your processed results
Authorizations
API key for authentication
Body
application/json
URL of the file to process
JSON schema for structured data extraction
Example:
{
"invoice_number": "string",
"total": "float"
}
Page range to process (e.g., "1-5", "1,3,5")
Example:
"1-10"
Custom chunk size in characters
Example:
5000
Extract figures and images
Generate AI descriptions for figures
Return HTML instead of markdown
Custom prompt for schema extraction