Batch Document Intake

Goal

Move from one-off document extraction to a repeatable intake queue for folders, customer uploads, or backfills.

Use This Workflow

Use batch when you already have a set of URLs or a storage prefix. Use webhooks when completion should wake your backend automatically.

Request

curl -X POST https://api.runpulse.com/batch/extract \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "file_urls": [
        "https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf",
        "https://platform.runpulse.com/api/examples/18ed11c2-dbce-4bf5-8385-102c55d13480/pdf"
      ]
    },
    "output": {
      "s3_prefix": "s3://customer-results/pulse/extractions/"
    },
    "extract_options": {
      "storage": {"enabled": true}
    },
    "workers": 4
  }'

Intake Record

Store one row per source document before you submit the batch:

Field	Why
`source_document_id`	Idempotency key from your app.
`source_url`	Reproduce or debug the input.
`batch_job_id`	Track the parent job.
`child_job_id`	Track individual file status.
`extraction_id`	Chain into Schema, Tables, Split, or retrieval.
`status`	Drive retry and review queues.

Checks

Set a worker count your downstream systems can absorb.
Treat batch completion as orchestration; each child job can still fail independently.
Retry failed child documents by source ID, not by blind resubmission.
Save extraction IDs before running downstream Schema or Tables steps.
Keep a webhook path for production and a polling path for local recovery.

Batch Processing

Full endpoint reference.

Production Webhooks

Event-driven completion.

S3 Storage Pipeline

Process cloud storage prefixes.

S3 Storage Pipeline Production Webhooks

⌘I

​Goal

​Use This Workflow

​Request

​Intake Record

​Checks

​Related