Goal
Extract different types of structured data from different sections of a long report without forcing one huge schema across the whole document.Sample Document
Use the built-in 10-K Annual Report Platform example. The example includes a saved Split output with topics for exhibits and financial statement schedules, signatures, and certifications.Use This Workflow
Use Extract -> Split -> Schema when sections have different vocabulary, layout, and output requirements.Split Topics
| Topic | Description |
|---|---|
| Financial Statement Schedules | Financial statement schedules, exhibits, and supporting tables. |
| Signatures | Signature blocks, officers, titles, and signing dates. |
| Certifications | Officer certifications and compliance attestations. |
Platform Steps
Add per-topic schemas
Define a different schema for each topic. Keep each schema narrow and specific.
Python
Checks
- Split topics should be mutually exclusive and easy to describe.
- Inspect page assignments before trusting schema output.
- Keep per-topic schemas smaller than a single all-purpose schema.
- If the same topic appears across many documents, save it as a split preset.
- For regulated review, store the split output and schema version IDs with the downstream record so reviewers can reproduce the exact extraction.
Related
Extract -> Split -> Schema
Full Platform walkthrough.
Chaining Steps
Understand the
extraction_id to split_id handoff.Sample Documents
Use a long sample PDF to test topic splits.