Overview
All API endpoints are authenticated using API keys provided to you upon billing.
File Upload
/extract Endpoint
Accepts a presigned URL from the user, typically used for high volume page count documents.
/convert Endpoint
Provides a presigned URL from Pulse’s secure S3 bucket for extraction of smaller files.
Document Processing
We offer both structured Markdown and JSON responses. JSON outputs can be customized using your own schema via the /extract
endpoint parameters.
Header Chunking
Splits content based on document headers and section breaks.
Page Chunking
Divides content by individual pages for page-specific processing.
Semantic Chunking
Segments content based on semantic meaning and context.
Recursive Chunking
Iteratively breaks down content into smaller, meaningful segments.
Table Extraction
Extract tables in matrix format using the return_tables
parameter. Returns tables as 2D arrays for easy processing.
An intelligent chunking strategy is coming soon! Our model will automatically select the optimal chunking mode for your specific use case.
Processing Pipeline
Upload
Submit your document via presigned URL or direct upload
Process
Our backend models extract content based on your parameters
Response
Receive your processed output directly or via presigned URL for larger documents
Asynchronous Processing
For processing large documents or handling multiple requests, we provide asynchronous endpoints:
/extract_async Endpoint
Initiates asynchronous document processing and returns a job ID for tracking.
/poll Endpoint
Check the status of an ongoing processing job using the job ID.
/cancel Endpoint
Cancel an in-progress processing job if needed.
Async processing is recommended for documents larger than 50 pages or when processing multiple files simultaneously.
Batch Processing
/batch_extract_async Endpoint
Process multiple documents in parallel with customized parameters for each file.
Batch Processing Features
Multiple Input Sources
Accept both file URLs and file paths in a single batch request
Custom Schemas
Apply different extraction schemas for each document
Parallel Processing
Process multiple documents simultaneously for improved throughput
Unified Tracking
Monitor all jobs with a single batch ID
Batch processing automatically handles rate limiting and queuing to ensure optimal performance.