Overview
Pulse API supports a wide range of document formats, enabling extraction from virtually any business document. All file types are processed with the same high accuracy and can be used with structured schema extraction.Supported Formats
PDF Documents
Extension:
.pdf
- Text-based PDFs (searchable)
- Image-based PDFs (scanned)
- Mixed content PDFs
- Password-protected PDFs (provide password)
- Multi-page documents
Images
Extensions:
.jpg
, .jpeg
, .png
- High-resolution scans
- Photographs of documents
- Screenshots
- Multi-page TIFFs
- Handwritten content (with limitations)
Office Documents
Extensions:
.docx
, .pptx
, .xlsx
- Microsoft Word documents
- PowerPoint presentations
- Excel spreadsheets
- Embedded images and charts
- Complex formatting preserved
Web Documents
Extensions:
.html
, .htm
- Static HTML pages
- Saved web pages
- HTML emails
- Inline styles preserved
- Embedded images extracted
Processing Large Files
For very large files, we recommend:- Using async processing (
/extract_async
) - Processing specific page ranges
- Contacting support for optimization strategies
Format-Specific Features
PDF Processing
PDFs receive the most comprehensive processing:- Preserve document structure
- Extract form fields
- Handle multi-column layouts
- Process rotated pages
- Extract embedded images
Image Processing
Images are processed using advanced OCR:- Resolution: Minimum 150 DPI, recommended 300 DPI
- Format: PNG for text, JPG for photos
- Size: Keep under 10MB per image
- Quality: Avoid blurry or skewed images
Office Document Processing
Office files maintain their structure:- Preserve table structures
- Extract embedded objects
- Handle multiple sheets/slides
- Maintain formatting context
File Upload Methods
Direct Upload
Upload files directly in your API request:Pre-upload to S3
For larger files or reusable uploads:URL-Based Processing
Process files from public URLs:Unsupported Formats
The following formats are not currently supported:- Video files (
.mp4
,.avi
,.mov
) - Audio files (
.mp3
,.wav
) - CAD files (
.dwg
,.dxf
) - Legacy Office formats (
.doc
,.xls
,.ppt
) - Compressed archives (
.zip
,.rar
) - Executable files (
.exe
,.app
)
Need support for a specific format? Contact us at hello@trypulse.ai
Format Detection
Pulse API automatically detects file types based on:- File extension
- MIME type
- File content analysis
Best Practices by Document Type
Financial Documents
Financial Documents
Invoices, Statements, Reports
- Use PDF format when possible
- Ensure text is selectable (not scanned)
- Include all pages for context
- Use schema extraction for structured data
Legal Documents
Legal Documents
Contracts, Agreements, Forms
- Maintain original formatting with PDF
- Process complete documents for context
- Use high-resolution scans for signatures
- Process complex layouts with care
Technical Documents
Technical Documents
Manuals, Specifications, Diagrams
- Use
extract_figure=True
for diagrams - Process in smaller chunks if very large
- Maintain original format for tables
- Consider HTML output for preservation
Medical Records
Medical Records
Clinical Notes, Lab Reports, Prescriptions
- Use high-resolution scans (300+ DPI)
- Process handwritten content carefully
- Verify extracted data accuracy
Troubleshooting
Common Issues
FILE_001: Invalid file type
FILE_001: Invalid file type
Solution: Check that your file extension matches our supported formats
FILE_002: File too large
FILE_002: File too large
Solution: Reduce file size or use async processing
FILE_003: File corrupted
FILE_003: File corrupted
Solution: Verify file integrity and re-save if needed