> ## Documentation Index > Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt > Use this file to discover all available pages before exploring further. # Batch Document Intake > Process many files through Extract, Schema, Tables, or Split with polling and retry-safe writes. ## Goal Move from one-off document extraction to a repeatable intake queue for folders, customer uploads, or backfills. ## Use This Workflow ```mermaid theme={null} sequenceDiagram participant App participant Pulse participant Store App->>Pulse: POST /batch/extract Pulse-->>App: batch_job_id App->>Pulse: GET /job/{batch_job_id} Pulse-->>Store: child extraction artifacts App->>Store: upsert normalized records ``` Use batch when you already have a set of URLs or a storage prefix. Use webhooks when completion should wake your backend automatically. ## Request ```bash theme={null} curl -X POST https://api.runpulse.com/batch/extract \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": { "file_urls": [ "https://platform.runpulse.com/api/examples/637e5678-30b1-45fa-acc4-877f2d636419/pdf", "https://platform.runpulse.com/api/examples/18ed11c2-dbce-4bf5-8385-102c55d13480/pdf" ] }, "output": { "s3_prefix": "s3://customer-results/pulse/extractions/" }, "extract_options": { "storage": {"enabled": true} }, "workers": 4 }' ``` ## Intake Record Store one row per source document before you submit the batch: | Field | Why | | -------------------- | ----------------------------------------------- | | `source_document_id` | Idempotency key from your app. | | `source_url` | Reproduce or debug the input. | | `batch_job_id` | Track the parent job. | | `child_job_id` | Track individual file status. | | `extraction_id` | Chain into Schema, Tables, Split, or retrieval. | | `status` | Drive retry and review queues. | ## Checks * Set a worker count your downstream systems can absorb. * Treat batch completion as orchestration; each child job can still fail independently. * Retry failed child documents by source ID, not by blind resubmission. * Save extraction IDs before running downstream Schema or Tables steps. * Keep a webhook path for production and a polling path for local recovery. ## Related Full endpoint reference. Event-driven completion. Process cloud storage prefixes.