Base URL
All API requests should be made to:Authentication
All endpoints require authentication via API key in the request header:Available Endpoints
Extract
POST
/extractSynchronously extract content from documents. Best for files under 50 pages.Extract Async
POST
/extract_asyncAsynchronously process large documents. Returns job ID for polling.Convert
POST
/convertUpload files to S3 and get a URL for processing.Get Job Status
GET
/job/{job_id}Check status and retrieve results of async jobs.Cancel Job
POST
/cancel/{job_id}Cancel a pending or processing async job.Configure Webhooks
POST
/webhookGet portal link to configure webhook endpoints.Request Format
Extract and Extract Async Endpoints
The/extract and /extract_async endpoints accept application/json content type:
multipart/form-data:
Convert Endpoint
The/convert endpoint accepts files via multipart/form-data:
Response Format
Successful Response
Large Document Response
For documents over 70 pages, content is delivered via S3 URL:Error Response
Common Parameters
Schema Parameter
Define structured data extraction:string- Text valuesinteger- Whole numbersfloat- Decimal numbersdate- Date valuesboolean- True/falsearray- Listsobject- Nested structures
Pages Parameter
Specify page ranges to process:- Single page:
"5" - Range:
"1-10" - Multiple:
"1,3,5-7,10" - All pages: Omit parameter
Output Options
Control extraction output:return_html: Return HTML instead of markdown (default: false)extract_figure: Extract images and figures (default: false)figure_description: Generate AI descriptions for figureschunk_size: Custom chunk size in characters
Status Codes
| Code | Description |
|---|---|
| 200 | Success |
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid API key |
| 403 | Forbidden - Access denied |
| 404 | Not Found - Resource doesn’t exist |
| 413 | Payload Too Large - File exceeds limit |
| 429 | Too Many Requests - Rate limited |
| 500 | Internal Server Error |
| 503 | Service Unavailable |
Best Practices
Use Appropriate Endpoints
Use Appropriate Endpoints
- Use
/extractfor documents under 50 pages - Use
/extract_asyncfor large documents - Use
/convertwhen you need to reuse file URLs
Handle Errors Gracefully
Handle Errors Gracefully
- Implement retry logic with exponential backoff
- Check error codes and handle specifically
- Log errors for debugging
Optimize Performance
Optimize Performance
- Process only necessary pages
- Use schemas for structured extraction
- Cache results when possible
Security
Security
- Never expose API keys in client code
- Use environment variables
- Rotate keys regularly
- Validate file types before upload
