Goal
Turn a Pulse extraction into vector-searchable chunks for a RAG application while keeping document provenance attached to every retrieved passage.Sample Document
Use Attention Is All You Need for a long-form paper. The code below uses its public APIfile_url.
Use This Workflow
Use Pulse chunking when the source layout, pages, tables, and citations matter before the text reaches your retrieval layer.Python
Checks
- Store
extraction_id, source URL, chunk strategy, and page metadata with every vector. - Prefer
pagechunking when an answer must cite exact pages. - Prefer
semanticorheaderchunking when topic coherence matters more than fixed size. - Re-index when the extraction config changes; chunk boundaries are part of your retrieval contract.
- Keep raw extraction output in your storage boundary when retrieval results feed regulated decisions.
Related
Chunking Parameters
Choose chunking and citation settings.
Sample Documents
Try public documents before connecting storage.
Agent Quickstart
Give agents document tools over MCP.