POST
/
extract
curl --request POST \
  --url https://pro.api.runpulse.com/extract \
  --header 'Content-Type: <content-type>' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "file-url": "PRESIGNED_URL",
  "return_table": true,
  "experimental_return_table": true,
  "chunking": "semantic,header,page,recursive",
  "schema": {},
  "schema_prompt": "",
  "pages": ""
}'
{
  "markdown": "<string>",
  "chunks": {},
  "plan-info": {
    "note": "<string>",
    "pages_used": 123,
    "tier": "<string>"
  },
  "schema-json": {}
}

Large Document Processing: For documents over 50 pages, we strongly recommend using the /extract_async endpoint instead. The async endpoint prevents timeout issues and provides better handling of extensive processing tasks.

Large Document Response Structure

For documents exceeding 70 pages, the API returns a URL structure instead of the direct response:

{
  "is_url": true,
  "url": "https://pulse-studio-api.s3.region.amazonaws.com/results/...",
  "plan-info": {
    "pages_used": 0,
    "tier": "foundation"
  }
}

Key Points:

  • Documents over 70 pages return a URL containing the complete response
  • URLs automatically expire after 24 hours
  • No changes required to your API request structure
  • For documents under 70 pages, the API continues to return results directly

Implementation Tips:

  1. Check if response contains "is_url": true
  2. If true, fetch the complete document data from the provided URL
  3. Store URLs securely as they contain your processed results

Authorizations

x-api-key
string
header
required

API key for authentication

Headers

x-api-key
string
required

API key for authorization

Content-Type
enum<string>
required
Available options:
application/json

Body

application/json

Response

200
application/json

Successful extraction and processing of file

The response is of type object.