Chunking For RAG - Pulse API

Goal

Pick a chunking setup that matches how your users ask questions and how reviewers validate answers.

Sample Documents

Use Attention Is All You Need for academic RAG, Legal Filing for page-grounded legal review, or 10-K Annual Report for section routing.

Decision Table

Retrieval need	Use	Why
User asks broad conceptual questions	`semantic`	Keeps related paragraphs together.
User navigates reports by headings	`header`	Preserves section boundaries.
Reviewers cite exact pages	`page`	Keeps provenance simple and auditable.
Vector DB has strict token/size limits	`recursive`	Produces predictable size windows.

Request

curl -X POST https://api.runpulse.com/extract \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_url": "https://platform.runpulse.com/api/examples/d4dfc1e2-60ac-4776-a5a8-20b88e68bf9f/pdf",
    "extensions": {
      "chunking": {
        "chunk_types": ["semantic", "page"],
        "chunk_size": 1200
      }
    },
    "async": true,
    "storage": {"enabled": true}
  }'

Recommended Patterns

Use case	Chunk types	Notes
Enterprise search	`semantic`, `page`	Use semantic for recall and page chunks for citation.
Legal review	`page`, `header`	Filter by page and section before answering.
Research assistant	`semantic`, `header`	Blend topic-level retrieval with section labels.
Embedding-only ingestion	`recursive`	Useful when the vector store enforces strict payload sizes.

Checks

Test chunking on three representative documents before locking the config.
Store chunk type and chunk index with every embedding.
Keep page references if any answer needs a citation or reviewer jump link.
Avoid tiny chunks for tables; they often lose headers and context.
Use Split before Schema when the document has multiple sections that need different extraction logic.

Chunking Parameters

Full chunking parameter guide.

Vector Metadata Contract

Persist chunk provenance.

LangChain RAG Ingestion

Build a local vector index.

​Goal

​Sample Documents

​Decision Table

​Request

​Recommended Patterns

​Checks

​Related