Start With A Real Example
Sample Documents
Use the same hosted examples shown in the Pulse Platform, with built-in Extract, Schema, Tables, and Split outputs.
Platform Quickstart
Run a workflow visually, inspect the output, then export matching code.
Recipe Index
| Recipe | Use when |
|---|---|
| Bank Statement To JSON | You need normalized JSON from a single document family. |
| Annual Report Split -> Schema | Long documents need section-specific schemas. |
| Financial Tables | Row/column structure matters more than named fields. |
| Spreadsheet Financial Review | Excel workbooks need hidden-content controls and raw values. |
| Chunking For RAG | You need a retrieval-friendly chunking strategy. |
| LangChain RAG Ingestion | You want to index Pulse chunks in a vector store. |
| Vector Metadata Contract | You need stable metadata for embeddings and audit trails. |
| Agent Diligence Review | An AI agent should use Pulse tools to review documents. |
| Footnote Citation Review | Footnotes affect the meaning of extracted text. |
| Word-Level Review Overlays | A UI needs exact word coordinates on the PDF. |
| S3 Storage Pipeline | Documents and results should stay in cloud storage. |
| Batch Document Intake | You need to process many files with retry-safe tracking. |
| Production Webhooks | Long-running jobs should wake your backend on completion. |
Extraction Recipes
Bank Statement To JSON
Extract account metadata, summary fields, transactions, and checks with Extract -> Schema.
Annual Report Split -> Schema
Split a long report into topics before applying narrow schemas.
Financial Tables
Reconstruct table-heavy PDFs with the Tables step.
Spreadsheet Financial Review
Parse Excel workbooks with raw values, hidden-content controls, and trimming.
Footnote Citation Review
Link footnote markers to the body text they qualify.
Word-Level Review Overlays
Return exact word coordinates for source-grounded review UIs.
Retrieval And Agent Recipes
Chunking For RAG
Choose semantic, header, page, and recursive chunks for retrieval.
LangChain RAG Ingestion
Convert Pulse chunks into LangChain documents and a vector index.
Vector Metadata Contract
Attach stable metadata to every embedded chunk.
Agent Diligence Review
Let an MCP agent extract, split, schema, and table documents.
Production And Storage Recipes
S3 Storage Pipeline
Process documents from S3 and write results back to cloud storage.
Batch Document Intake
Process many files with retry-safe status tracking.
Production Webhooks
Move long-running jobs into an async, event-driven backend.
Enterprise Patterns
| Pattern | Why it matters | Start here |
|---|---|---|
| Human review with citations | Regulated teams need traceability before data enters a system of record. | Schema extraction |
| Saved configs and change control | Teams need repeatable settings instead of one-off prompts in code. | Step Preset Library |
| Async jobs and webhooks | Large files should not depend on tight polling loops or browser sessions. | Production Webhooks |
| Storage boundaries | Enterprise workflows often require customer-controlled storage and retention. | Security & Compliance |
| Recovery paths | Production integrations need retries, idempotency, and failure visibility. | Error Handling |
Pick The Right Step
| Need | Use |
|---|---|
| Markdown, citations, figures, chunks, or the first reusable document representation | Extract |
| Known JSON shape such as account metadata, rent roll fields, or invoice totals | Schema |
| Row and column fidelity for schedules, statements, or financial tables | Tables |
| Different sections need different handling or downstream schemas | Split |