Goal
Move from one-off document extraction to a repeatable intake queue for folders, customer uploads, or backfills.Use This Workflow
Use batch when you already have a set of URLs or a storage prefix. Use webhooks when completion should wake your backend automatically.Request
Intake Record
Store one row per source document before you submit the batch:| Field | Why |
|---|---|
source_document_id | Idempotency key from your app. |
source_url | Reproduce or debug the input. |
batch_job_id | Track the parent job. |
child_job_id | Track individual file status. |
extraction_id | Chain into Schema, Tables, Split, or retrieval. |
status | Drive retry and review queues. |
Checks
- Set a worker count your downstream systems can absorb.
- Treat batch completion as orchestration; each child job can still fail independently.
- Retry failed child documents by source ID, not by blind resubmission.
- Save extraction IDs before running downstream Schema or Tables steps.
- Keep a webhook path for production and a polling path for local recovery.
Related
Batch Processing
Full endpoint reference.
Production Webhooks
Event-driven completion.
S3 Storage Pipeline
Process cloud storage prefixes.