Skip to main content

Goal

Move from one-off document extraction to a repeatable intake queue for folders, customer uploads, or backfills.

Use This Workflow

Use batch when you already have a set of URLs or a storage prefix. Use webhooks when completion should wake your backend automatically.

Request

curl -X POST https://api.runpulse.com/batch/extract \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "file_urls": [
        "https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf",
        "https://platform.runpulse.com/api/examples/18ed11c2-dbce-4bf5-8385-102c55d13480/pdf"
      ]
    },
    "output": {
      "s3_prefix": "s3://customer-results/pulse/extractions/"
    },
    "extract_options": {
      "storage": {"enabled": true}
    },
    "workers": 4
  }'

Intake Record

Store one row per source document before you submit the batch:
FieldWhy
source_document_idIdempotency key from your app.
source_urlReproduce or debug the input.
batch_job_idTrack the parent job.
child_job_idTrack individual file status.
extraction_idChain into Schema, Tables, Split, or retrieval.
statusDrive retry and review queues.

Checks

  • Set a worker count your downstream systems can absorb.
  • Treat batch completion as orchestration; each child job can still fail independently.
  • Retry failed child documents by source ID, not by blind resubmission.
  • Save extraction IDs before running downstream Schema or Tables steps.
  • Keep a webhook path for production and a polling path for local recovery.

Batch Processing

Full endpoint reference.

Production Webhooks

Event-driven completion.

S3 Storage Pipeline

Process cloud storage prefixes.