> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Chunking For RAG

> Choose semantic, header, page, and recursive chunks for retrieval workflows.

## Goal

Pick a chunking setup that matches how your users ask questions and how reviewers validate answers.

## Sample Documents

Use [Attention Is All You Need](https://platform.runpulse.com/dashboard/examples/3be15d23-d622-4f27-9843-ec2929140eec) for academic RAG, [Legal Filing](https://platform.runpulse.com/dashboard/examples/d4dfc1e2-60ac-4776-a5a8-20b88e68bf9f) for page-grounded legal review, or [10-K Annual Report](https://platform.runpulse.com/dashboard/examples/0514fc05-8b0a-4a3b-9b9b-18ac89fc04e5) for section routing.

## Decision Table

| Retrieval need                         | Use         | Why                                    |
| -------------------------------------- | ----------- | -------------------------------------- |
| User asks broad conceptual questions   | `semantic`  | Keeps related paragraphs together.     |
| User navigates reports by headings     | `header`    | Preserves section boundaries.          |
| Reviewers cite exact pages             | `page`      | Keeps provenance simple and auditable. |
| Vector DB has strict token/size limits | `recursive` | Produces predictable size windows.     |

## Request

```bash theme={null}
curl -X POST https://api.runpulse.com/extract \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_url": "https://platform.runpulse.com/api/examples/d4dfc1e2-60ac-4776-a5a8-20b88e68bf9f/pdf",
    "extensions": {
      "chunking": {
        "chunk_types": ["semantic", "page"],
        "chunk_size": 1200
      }
    },
    "async": true,
    "storage": {"enabled": true}
  }'
```

## Recommended Patterns

| Use case                 | Chunk types          | Notes                                                       |
| ------------------------ | -------------------- | ----------------------------------------------------------- |
| Enterprise search        | `semantic`, `page`   | Use semantic for recall and page chunks for citation.       |
| Legal review             | `page`, `header`     | Filter by page and section before answering.                |
| Research assistant       | `semantic`, `header` | Blend topic-level retrieval with section labels.            |
| Embedding-only ingestion | `recursive`          | Useful when the vector store enforces strict payload sizes. |

## Checks

* Test chunking on three representative documents before locking the config.
* Store chunk type and chunk index with every embedding.
* Keep page references if any answer needs a citation or reviewer jump link.
* Avoid tiny chunks for tables; they often lose headers and context.
* Use Split before Schema when the document has multiple sections that need different extraction logic.

## Related

<CardGroup cols={3}>
  <Card title="Chunking Parameters" icon="scissors" href="/concepts/processing-parameters-chunking">
    Full chunking parameter guide.
  </Card>

  <Card title="Vector Metadata Contract" icon="database" href="/cookbooks/vector-metadata-contract">
    Persist chunk provenance.
  </Card>

  <Card title="LangChain RAG Ingestion" icon="diagram-project" href="/cookbooks/rag-langchain-vector-store">
    Build a local vector index.
  </Card>
</CardGroup>
