Skip to main content
After Extract, most decisions come down to three features:
  • Schema turns document content into named JSON fields.
  • Tables reconstructs table structure as table output.
  • Split assigns pages to semantic topics so later steps can run only where they belong.

Comparison

FeatureBest forInputOutput
SchemaField extraction, normalized JSON, citationsextraction_id or split_idschema_output.values and citations
TablesFinancial tables, schedules, cross-page tables, charts as tablesextraction_id or split contextHTML table objects with citations
SplitTopic routing and per-section workflowsextraction_idTopic-to-page assignments and split_id

Use Schema When

You can name the fields you want before processing the document:
  • Invoice number, vendor, total, and line items
  • Parties, effective date, renewal terms, and termination clauses
  • Policyholder, coverage limit, deductible, and claim number
  • Bank account holder, balances, and transaction summary
Good schemas include field descriptions. The description tells Pulse what the field means, not just what it is called.
{
  "type": "object",
  "properties": {
    "effective_date": {
      "type": "string",
      "description": "The date the agreement becomes effective, not the signature date"
    }
  }
}

Use Tables When

The row and column structure matters:
  • 10-K and 10-Q financial statements
  • Capitalization tables
  • Loss runs and bordereaux
  • Pricing schedules
  • Tables that continue across pages
Enable table merge when a table continues over page breaks. Enable chart-to-table conversion when charts or figures contain numeric data you need downstream.

Use Split When

The document has distinct sections that should not all receive the same instructions:
  • Annual reports with business overview, risk factors, financials, and leadership sections
  • Loan files with application, underwriting, appraisal, and closing documents
  • Insurance packets with policy terms, schedules, claims, and endorsements
  • Research or diligence packs with mixed document types
Split returns page groups. You can stop there for routing, or pass the split_id to Schema or Tables for topic-specific output.

Patterns

Extract -> Schema

Best for single-structure documents.

Extract -> Tables

Best for table-first output.

Extract -> Split -> Schema

Best for long mixed documents.

Practical Rule

If you want values, use Schema. If you want tables, use Tables. If you need different logic for different pages, use Split first.

Schema Guidelines

Write schemas that produce reliable JSON.

Tables API

Extract and merge structured tables.

Split API

Assign pages to topics.