> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# S3 Storage Pipeline

> Process documents from S3 and keep extraction artifacts in your storage boundary.

## Goal

Run a document ingestion workflow where source files and extraction results live in your cloud storage path instead of being copied through one-off local scripts.

## Use This Workflow

```mermaid theme={null}
flowchart LR
    A["S3 source prefix"] --> B["Pulse batch extract"]
    B --> C["S3 results prefix"]
    C --> D["Your database or review app"]
```

Use this pattern for regulated intake queues, nightly backfills, and customer-controlled storage workflows.

## Batch Extract From S3

```python theme={null}
from pulse import Pulse
from pulse.types.batch_input_source import BatchInputSource
from pulse.types.batch_output_destination import BatchOutputDestination

client = Pulse(api_key="YOUR_API_KEY")

job = client.batch.extract(
    input=BatchInputSource(s_3_prefix="s3://customer-docs/intake/"),
    output=BatchOutputDestination(s_3_prefix="s3://customer-docs/pulse-results/"),
    extract_options={
        "extensions": {
            "chunking": {
                "chunk_types": ["page"],
                "chunk_size": 1200
            }
        },
        "storage": {"enabled": True}
    },
    workers=8,
)

print(job.batch_job_id)
```

## Single File With A Presigned URL

```python theme={null}
from pulse import Pulse

client = Pulse(api_key="YOUR_API_KEY")

result = client.extract(
    file_url="https://customer-bucket.s3.amazonaws.com/path/file.pdf?X-Amz-Signature=...",
    async_=True,
    storage={"enabled": True},
)

print(result.job_id)
```

## Checks

* Use short-lived presigned URLs for single-file extraction.
* Use batch S3 prefixes for high-volume queues.
* Enable Bring Your Own Storage when artifacts must stay in your cloud account.
* Persist `batch_job_id`, child job IDs, source object keys, and output prefixes.
* Make downstream writes idempotent by source object key and extraction ID.

## Related

<CardGroup cols={3}>
  <Card title="AWS S3 Setup" icon="aws" href="/storage/aws-s3">
    Configure Pulse access to your bucket.
  </Card>

  <Card title="Batch Processing" icon="layer-group" href="/api-reference/endpoint/batch-overview">
    Full batch endpoint reference.
  </Card>

  <Card title="Production Webhooks" icon="webhook" href="/cookbooks/webhooks-production">
    Receive completion events.
  </Card>
</CardGroup>
