POST
/
extract
curl --request POST \
  --url https://api.runpulse.com/extract \
  --header 'Content-Type: <content-type>' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "file-url": "PRESIGNED_URL",
  "return_table": true,
  "experimental_return_table": true,
  "chunking": "<chunking method>",
  "schema": {},
  "thinking": false,
  "custom_prompt": "",
  "pages": ""
}'
{
  "markdown": "<string>",
  "chunks": {},
  "plan-info": {
    "note": "<string>",
    "pages_used": 123,
    "tier": "<string>"
  },
  "schema-json": {}
}

Large Document Processing: For documents over 50 pages, we strongly recommend using the /extract_async endpoint instead. The async endpoint prevents timeout issues and provides better handling of extensive processing tasks.

Large Document Response Structure

For documents exceeding 70 pages, the API returns a URL structure instead of the direct response:

{
  "is_url": true,
  "url": "https://pulse-studio-api.s3.region.amazonaws.com/results/...",
  "plan-info": {
    "pages_used": 0,
    "tier": "foundation"
  }
}

Key Points:

  • Documents over 70 pages return a URL containing the complete response
  • URLs automatically expire after 24 hours
  • No changes required to your API request structure
  • For documents under 70 pages, the API continues to return results directly

Implementation Tips:

  1. Check if response contains "is_url": true
  2. If true, fetch the complete document data from the provided URL
  3. Store URLs securely as they contain your processed results

Authorizations

x-api-key
string
header
required

API key for authentication

Headers

x-api-key
string
required

API key for authorization

Content-Type
enum<string>
required
Available options:
application/json

Body

application/json
file-url
string
required
Example:

"PRESIGNED_URL"

return_table
boolean
Example:

true

experimental_return_table
boolean
Example:

true

chunking
string
Example:

"<chunking method>"

schema
object

JSON schema for processing

Example:
{}
thinking
boolean

When true, enables enhanced reasoning for higher accuracy with slightly increased processing time but superior extraction quality

Example:

false

custom_prompt
string

Custom instructions, notes, or specifications for domain-specific extraction needs

Example:

""

pages
string

Precise page selection for targeted extraction. Supports single pages (1,3,5), page ranges (1-5,7-9), or combinations (1,3-5,7,9-11)

Example:

""

Response

200
application/json
Successful extraction and processing of file
markdown
string
required

Full document content in markdown format

chunks
object
required

Document broken down into smaller, processable chunks

plan-info
object
required
schema-json
object
required

Extracted structured data based on provided schema