Base URL
All API requests should be made to:Authentication
All endpoints require authentication via API key in the request header:Available Endpoints
Extract
POST
/extract
Synchronously extract content from documents. Best for files under 50 pages.Extract Async
POST
/extract_async
Asynchronously process large documents. Returns job ID for polling.Convert
POST
/convert
Upload files to S3 and get a URL for processing.Get Job Status
GET
/job/{job_id}
Check status and retrieve results of async jobs.Cancel Job
POST
/cancel/{job_id}
Cancel a pending or processing async job.Configure Webhooks
POST
/webhook
Get portal link to configure webhook endpoints.Request Format
Extract and Extract Async Endpoints
The/extract
and /extract_async
endpoints accept application/json
content type:
multipart/form-data
:
Convert Endpoint
The/convert
endpoint accepts files via multipart/form-data
:
Response Format
Successful Response
Large Document Response
For documents over 70 pages, content is delivered via S3 URL:Error Response
Common Parameters
Schema Parameter
Define structured data extraction:string
- Text valuesinteger
- Whole numbersfloat
- Decimal numbersdate
- Date valuesboolean
- True/falsearray
- Listsobject
- Nested structures
Pages Parameter
Specify page ranges to process:- Single page:
"5"
- Range:
"1-10"
- Multiple:
"1,3,5-7,10"
- All pages: Omit parameter
Output Options
Control extraction output:return_html
: Return HTML instead of markdown (default: false)extract_figure
: Extract images and figures (default: false)figure_description
: Generate AI descriptions for figureschunk_size
: Custom chunk size in characters
Status Codes
Code | Description |
---|---|
200 | Success |
400 | Bad Request - Invalid parameters |
401 | Unauthorized - Invalid API key |
403 | Forbidden - Access denied |
404 | Not Found - Resource doesn’t exist |
413 | Payload Too Large - File exceeds limit |
429 | Too Many Requests - Rate limited |
500 | Internal Server Error |
503 | Service Unavailable |
Best Practices
Use Appropriate Endpoints
Use Appropriate Endpoints
- Use
/extract
for documents under 50 pages - Use
/extract_async
for large documents - Use
/convert
when you need to reuse file URLs
Handle Errors Gracefully
Handle Errors Gracefully
- Implement retry logic with exponential backoff
- Check error codes and handle specifically
- Log errors for debugging
Optimize Performance
Optimize Performance
- Process only necessary pages
- Use schemas for structured extraction
- Cache results when possible
Security
Security
- Never expose API keys in client code
- Use environment variables
- Rotate keys regularly
- Validate file types before upload