Skip to main content

Asynchronous Processing

For large documents or production workflows, use async processing to avoid timeouts and handle long-running operations gracefully.

How It Works

  1. Submit - Send your request with async: true
  2. Receive job ID - Get an immediate response with a job_id
  3. Poll - Check job status via GET /job/{jobId}
  4. Get results - Retrieve completed results from the poll response

Endpoints with Async Support

EndpointAsync FlagAsync Response
POST /extractasync: true202 with job_id
POST /splitasync: true202 with job_id
POST /schemaasync: true202 with job_id
POST /extract_async is deprecated. Use POST /extract with async: true instead.

Using the Async Flag

Add async: true to any supported endpoint’s request body:
from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

# Async extraction
job = client.extract(
    file_url="https://example.com/large-report.pdf",
    async_=True  # Note: async_ in Python (async is reserved)
)
print(f"Job ID: {job.job_id}")
print(f"Status: {job.status}")  # "pending"

# Async schema extraction
job = client.schema(
    extraction_id="abc123",
    structured_output={"schema": {...}},
    async_=True
)

# Async split
job = client.split(
    extraction_id="abc123",
    topics=[{"name": "financials", "description": "..."}],
    async_=True
)

Async Response Format

When async: true, you receive a 202 Accepted response:
{
  "job_id": "abc123-def456-ghi789",
  "status": "pending"
}
FieldTypeDescription
job_idstringUnique identifier for the async job
statusstringInitial status: pending or processing

Polling for Results

Use GET /job/{jobId} to check status and retrieve results:
import time

job_id = job.job_id

while True:
    status = client.jobs.get_job(job_id=job_id)
    print(f"Status: {status.status}")
    
    if status.status == "completed":
        print("Done!")
        print(f"Result: {status.result}")
        break
    elif status.status == "failed":
        print(f"Failed: {status.error}")
        break
    elif status.status == "canceled":
        print("Job was canceled")
        break
    
    time.sleep(2)  # Poll every 2 seconds

Poll Response

{
  "job_id": "abc123-def456-ghi789",
  "status": "completed",
  "created_at": "2026-02-04T10:30:00Z",
  "completed_at": "2026-02-04T10:30:45Z",
  "result": {
    "markdown": "# Document Content...",
    "page_count": 50,
    "structured_output": {...}
  }
}

Job Status Values

StatusDescription
pendingJob is queued, waiting to start
processingJob is currently running
completedJob finished successfully - results available
failedJob encountered an error
canceledJob was canceled by user

Canceling Jobs

Cancel a running job with DELETE /job/{jobId}:
client.jobs.cancel_job(job_id=job_id)

When to Use Async

Synchronous requests may timeout for large documents. Always use async for documents over 50 pages.
Schema extraction with many fields or nested structures benefits from async processing.
Async provides better reliability and allows you to handle failures gracefully with retries.
Submit multiple documents asynchronously and poll for results in parallel.

Sync vs Async Comparison

AspectSync (async: false)Async (async: true)
ResponseFull resultJob ID only
HTTP Status200202
Timeout riskHigherLower
Best forSmall docs, testingProduction, large docs
Polling neededNoYes

Webhooks Alternative

Instead of polling, you can use webhooks to receive notifications when jobs complete:
# Configure webhook via Svix portal
webhook_link = client.webhooks.get_portal()
print(f"Configure webhooks at: {webhook_link.url}")

# Submit async job - webhook will notify on completion
job = client.extract(file_url="...", async_=True)
See Svix Webhooks for setup instructions.