Fast Path
Create an API key
Open the Pulse Platform, create an API key, and keep it out of source control.
Connect an MCP client
Use the hosted server for URL-based documents. Use the local server when the agent must read files from disk.
Give the agent a document
Hosted MCP needs a public or pre-signed
file_url. Local MCP can use either file_url or an absolute file_path.Run extract first
Most workflows start with
extract. Use the returned extraction_id for schema, split, and table tools.Client Configs
- Codex Hosted
- Claude Code Hosted
- Local
~/.codex/config.toml
First Agent Prompt
Give the agent the document location, the output you want, and any evidence requirements:Tool Paths
| Agent goal | Tool path | Notes |
|---|---|---|
| Parse a document into text | extract | Start here. Returns markdown and extraction_id. |
| Get structured JSON | extract -> generate_schema -> apply_schema | Use generate_schema when the agent needs help designing fields. |
| Run a known schema | extract -> apply_schema | Pass the schema directly or use a saved schema_config_id. |
| Route a long packet | extract -> split_document -> apply_schema | Use when different pages need different schemas. |
| Pull tables | extract -> extract_tables | Turn on table merge for cross-page tables. |
| Process many URLs | batch_extract -> get_job | Use for asynchronous batches. |
| Run a saved workflow | run_pipeline -> get_job | Best when product teams already maintain a Platform pipeline. |
Extraction Settings Agents Should Know
Agents can pass the same high-impact extraction settings you use in the Platform:| Setting | Use when |
|---|---|
pages | The user only needs specific pages. |
footnote_references | Footnote markers and body text need to be linked. |
figure_descriptions | Charts, images, or diagrams need text descriptions. |
show_images | The app needs image URLs for extracted visuals. |
chunk_types and chunk_size | Output will feed retrieval, memory, or embeddings. |
only_data_rows and only_data_cols | Excel files have large empty trailing ranges. |
Agent Guardrails
- Hosted MCP cannot read local disk paths. Use a public or pre-signed URL.
- Local MCP can read an absolute
file_path, but chat attachments are not automatically file paths. - Always reuse
extraction_idinstead of re-extracting the same document. - Poll
get_jobbefore passing a result into the next tool. - Ask for a schema before writing JSON to a database or downstream system.
- Keep API keys out of prompts, logs, screenshots, and committed config.
- For regulated workflows, return page references, footnote links, chunks, or bounding boxes whenever users need evidence.
MCP Tools
Full tool parameters and return shapes.
Sample Documents
Public documents agents can use for smoke tests.