Skip to main content

Goal

Run a document ingestion workflow where source files and extraction results live in your cloud storage path instead of being copied through one-off local scripts.

Use This Workflow

Use this pattern for regulated intake queues, nightly backfills, and customer-controlled storage workflows.

Batch Extract From S3

from pulse import Pulse
from pulse.types.batch_input_source import BatchInputSource
from pulse.types.batch_output_destination import BatchOutputDestination

client = Pulse(api_key="YOUR_API_KEY")

job = client.batch.extract(
    input=BatchInputSource(s_3_prefix="s3://customer-docs/intake/"),
    output=BatchOutputDestination(s_3_prefix="s3://customer-docs/pulse-results/"),
    extract_options={
        "extensions": {
            "chunking": {
                "chunk_types": ["page"],
                "chunk_size": 1200
            }
        },
        "storage": {"enabled": True}
    },
    workers=8,
)

print(job.batch_job_id)

Single File With A Presigned URL

from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

result = client.extract(
    file_url="https://customer-bucket.s3.amazonaws.com/path/file.pdf?X-Amz-Signature=...",
    async_=True,
    storage={"enabled": True},
)

print(result.job_id)

Checks

  • Use short-lived presigned URLs for single-file extraction.
  • Use batch S3 prefixes for high-volume queues.
  • Enable Bring Your Own Storage when artifacts must stay in your cloud account.
  • Persist batch_job_id, child job IDs, source object keys, and output prefixes.
  • Make downstream writes idempotent by source object key and extraction ID.

AWS S3 Setup

Configure Pulse access to your bucket.

Batch Processing

Full batch endpoint reference.

Production Webhooks

Receive completion events.