Goal
Pick a chunking setup that matches how your users ask questions and how reviewers validate answers.Sample Documents
Use Attention Is All You Need for academic RAG, Legal Filing for page-grounded legal review, or 10-K Annual Report for section routing.Decision Table
| Retrieval need | Use | Why |
|---|---|---|
| User asks broad conceptual questions | semantic | Keeps related paragraphs together. |
| User navigates reports by headings | header | Preserves section boundaries. |
| Reviewers cite exact pages | page | Keeps provenance simple and auditable. |
| Vector DB has strict token/size limits | recursive | Produces predictable size windows. |
Request
Recommended Patterns
| Use case | Chunk types | Notes |
|---|---|---|
| Enterprise search | semantic, page | Use semantic for recall and page chunks for citation. |
| Legal review | page, header | Filter by page and section before answering. |
| Research assistant | semantic, header | Blend topic-level retrieval with section labels. |
| Embedding-only ingestion | recursive | Useful when the vector store enforces strict payload sizes. |
Checks
- Test chunking on three representative documents before locking the config.
- Store chunk type and chunk index with every embedding.
- Keep page references if any answer needs a citation or reviewer jump link.
- Avoid tiny chunks for tables; they often lose headers and context.
- Use Split before Schema when the document has multiple sections that need different extraction logic.
Related
Chunking Parameters
Full chunking parameter guide.
Vector Metadata Contract
Persist chunk provenance.
LangChain RAG Ingestion
Build a local vector index.