> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Document Lifecycle

> Understand how files, extractions, jobs, results, and saved configs fit together.

Pulse has a small set of IDs that make the platform and API feel predictable once you know what each one represents.

```mermaid theme={null}
flowchart LR
    A[File or URL] --> B["Extract"]
    B --> C["extraction_id"]
    C --> D["Split"]
    C --> E["Schema"]
    C --> F["Tables"]
    D --> G["split_id"]
    G --> H["Schema or Tables in split mode"]
    B --> I["job_id when async"]
    E --> J["schema_id"]
    F --> K["tables_id"]
```

## Core Objects

| Object          | Created by                      | Used for                                                                               |
| --------------- | ------------------------------- | -------------------------------------------------------------------------------------- |
| `job_id`        | Any async request               | Polling job status and retrieving async results.                                       |
| `extraction_id` | `/extract` with storage enabled | Reusing parsed content for Schema, Split, Tables, reruns, and saved results.           |
| `split_id`      | `/split`                        | Applying per-topic schemas or table extraction to split page groups.                   |
| `schema_id`     | `/schema`                       | Tracking a structured output version and downloading filled Excel templates when used. |
| `tables_id`     | `/tables`                       | Tracking table extraction output, especially async table jobs.                         |
| Config IDs      | Saved presets                   | Reusing Extract, Split, Schema, and Tables settings without inlining JSON.             |

## Storage Defaults

Storage is enabled by default for normal workflows because downstream steps need saved extraction artifacts. If storage is disabled, Pulse can still return the immediate Extract response, but later steps may not be able to reuse that extraction.

<Warning>
  Do not disable storage if you plan to call `/schema`, `/split`, `/tables`, run partial reruns, or inspect the result in the Platform.
</Warning>

## Async Lifecycle

For longer documents or heavier steps, set `async: true`.

1. The request returns quickly with a `job_id`.
2. Your app polls `GET /job/{jobId}` or waits for a webhook.
3. When status is `completed`, the job result contains the same output the sync call would have returned.
4. Large results may include a download URL instead of embedding the entire payload.

## Platform Lifecycle

In the Platform, the same lifecycle appears as a visual pipeline:

1. Upload or select a document.
2. Configure Extract and optional downstream steps.
3. Run the pipeline.
4. Inspect outputs in Markdown, Tables, Split, and Schema views.
5. Save step presets or a full pipeline preset.
6. Use Show Code to reproduce the pipeline from the SDK.

## Production Lifecycle

A mature production workflow usually looks like this:

```mermaid theme={null}
flowchart TD
    A[Test representative documents] --> B[Save stable presets]
    B --> C[Generate code from Platform]
    C --> D[Run async for long jobs]
    D --> E[Use webhooks or polling]
    E --> F[Store normalized output in your system]
    F --> G[Monitor errors, usage, and drift]
```

## Related

<CardGroup cols={2}>
  <Card title="Chaining Steps" icon="link" href="/concepts/chaining">
    Learn the exact ID handoffs between Extract, Split, Schema, Tables, and Jobs.
  </Card>

  <Card title="Async Processing" icon="clock" href="/api-reference/async-processing">
    Decide when to run jobs asynchronously and how to poll safely.
  </Card>
</CardGroup>
