The full Pulse MCP tool reference: parameters, return shapes, behaviors, and end-to-end agent examples.
The Pulse MCP server exposes eight tools, identical on the hosted endpoint and the local
uvx pulse-mcp server except where called out below. This page covers how they behave,
the full reference for each, and worked end-to-end examples. If you haven’t connected a
client yet, start with Connecting a client.
Documents are referenced by URL: a public or pre-signed link the engine can fetch.
The hosted server does not read files from your local disk, so a tool like extract
takes a file_url, not a file upload.The local server closes that gap: there, extract also accepts a
file_path. The server process reads the file from disk and uploads it out-of-band, so
the document’s bytes never pass through the model or the chat context. Files up to
50 MB; supported types: pdf, png, jpg, jpeg, bmp, tiff, docx, pptx, xlsx, xlsm, csv,
txt, html.On the hosted server, host the file somewhere reachable (e.g. an S3 pre-signed URL) and
pass that URL. If you need direct file uploads in your own code, use the
SDKs or REST API instead, which support multipart upload.
You extract a document once. extract returns an extraction_id, and the downstream
tools (apply_schema, split_document, extract_tables) take that id so they operate
on the already-parsed document instead of re-fetching and re-parsing it.
Extraction, schema, split, and table operations run asynchronously. Each tool submits the
job and inline-polls it for up to ~60 seconds:
If it finishes in time, the tool returns the completed result directly.
If it’s still running, the tool returns a stub: { "status": "processing", "job_id": "...", "poll_with": "get_job" }.
When you get a stub, call get_job with that job_id to fetch the status and result.
batch_extract is always asynchronous and returns a batch_job_id to poll the same way.
To stay under MCP clients’ tool-result size limits, results larger than the inline budget
(~350 KB) are offloaded to a download link and returned as a stub:
{ "is_url": true, "url": "https://..." }.The agent cannot fetch that link itself: by design, agents can’t follow tool-output
URLs. To read an offloaded result, paste the URL back into the chat (which makes it
user-provided and fetchable) or open it in your browser. For very large outputs, prefer
the SDK/API, which streams results without this limit.
Parse a document into markdown, with optional HTML, figure processing, and chunking.
Provide the document as a file_url, or on the local server, a
file_path.
Parameter
Type
Required
Description
file_url
string
One input
Public or pre-signed URL of the PDF / document.
file_path
string
One input
Local server only. Path to a local file (~ works), read by the server and uploaded out-of-band. See document inputs.
pages
string
No
Page range, 1-indexed, e.g. "1-2,5".
model
string
No
Model override: "default" or "pulse-ultra-2".
return_html
boolean
No
Also return an HTML representation of the document.
footnote_references
boolean
No
Link footnote markers to their footnote text.
figure_descriptions
boolean
No
Generate descriptive captions for figures and visuals.
show_images
boolean
No
Return image URLs for extracted visuals.
chunk_types
string[]
No
Chunking strategies: any of "semantic", "header", "page", "recursive".
chunk_size
integer
No
Max characters per chunk.
Provide exactly one of file_url or file_path.
Returns the completed extraction (including extraction_id and markdown), or a
{ status, job_id } stub to poll with get_job if it’s still running.
For structured JSON, run extract first, then call apply_schema on the
returned extraction_id. Schema extraction directly on extract is deprecated.
Split a prior extraction into topic-based page ranges. Run extract first; there is no
file input here.
Parameter
Type
Required
Description
extraction_id
string
Yes
ID of a completed extraction to split.
topics
object[]
No*
Topic objects, each { "name": string, "description": string }.
split_config_id
string
No*
A saved split config (alternative to topics).
*Provide either topics or a split_config_id.
Returns{ split_id, split_output: { splits: { <topic>: [page numbers] } } }, or a
{ status, job_id } stub to poll with get_job. Feed the split_id into apply_schema
for per-topic structured extraction.
These show the sequence of tool calls an agent makes. In practice you express the goal in
natural language and the agent chooses the tools, but seeing the chain makes the behavior
predictable.
The server reads the file and uploads it out-of-band; the document’s bytes never enter
the chat context. On the hosted server, pass a file_url instead.