Overview
The Extract → Schema pipeline adds structured data extraction on top of the base extraction. You define a JSON Schema describing the fields you want, and Pulse extracts them from the entire document as structured JSON with citations.When to Use
- Invoice processing — extract vendor name, invoice number, line items, totals
- Form extraction — pull fields from applications, tax forms, insurance claims
- Contract parsing — extract parties, dates, clauses, obligations
- Single-structure documents — any document where one schema covers the entire content
How to Use in the Playground
Set page range, figure extraction, chunking, and other options on the Configuration tab — same as Extract Only.
Switch to the Schema step in the pipeline tabs. Define a JSON Schema describing the fields you want to extract:
{
"type": "object",
"properties": {
"invoice_number": { "type": "string", "description": "The invoice identifier" },
"vendor_name": { "type": "string", "description": "Name of the vendor or seller" },
"total_amount": { "type": "number", "description": "Total amount due" },
"due_date": { "type": "string", "description": "Payment due date" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "integer" },
"unit_price": { "type": "number" }
}
}
}
},
"required": ["invoice_number", "vendor_name"]
}
Optionally add a schema prompt to guide the extraction — e.g., “Extract billing details from this invoice. Line items should include all products listed.”
What You Get Back
Everything from Extract, plus:| Field | Description |
|---|---|
schema_output.values | Extracted field values matching your JSON Schema |
schema_output.citations | Source locations for each extracted value |
schema_id | Saved schema result ID |
Schema Tips
Use descriptions on every field
Use descriptions on every field
The
description property in your JSON Schema helps Pulse understand what to look for. Be specific:Use required fields strategically
Use required fields strategically
Mark fields as
required when you know they’ll always be present. Optional fields are returned as null if not found.Use arrays for repeating items
Use arrays for repeating items
For tables or lists in the document (line items, attendees, clauses), use
"type": "array" with an items schema.Add a schema prompt for context
Add a schema prompt for context
The schema prompt gives the extraction model additional context. Use it to clarify ambiguities or specify preferences.
Enable effort mode for complex schemas
Enable effort mode for complex schemas
If your schema has many nested fields or the document layout is complex, enable Effort mode in the extraction settings for higher accuracy.
API Usage
- Python
- TypeScript
- curl
