POST
/
extract
Extract Document
curl --request POST \
  --url https://dev.api.runpulse.com/extract \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "file-url": "<string>",
  "schema": {
    "invoice_number": "string",
    "total": "float"
  },
  "pages": "1-10",
  "chunk_size": 5000,
  "extract_figure": false,
  "figure_description": false,
  "return_html": false,
  "schema_prompt": "<string>"
}'
{
  "content": "<string>",
  "metadata": {
    "page_count": 123,
    "processing_time": 123,
    "source_file": "<string>"
  },
  "schema_data": {}
}
Large Document Processing: For documents over 50 pages, we strongly recommend using the /extract_async endpoint instead. The async endpoint prevents timeout issues and provides better handling of extensive processing tasks.

Large Document Response Structure

For documents exceeding 70 pages, the API returns a URL structure instead of the direct response:
{
  "is_url": true,
  "url": "https://pulse-studio-api.s3.region.amazonaws.com/results/...",
  "plan-info": {
    "pages_used": 0,
    "tier": "foundation"
  }
}

Key Points:

  • Documents over 70 pages return a URL containing the complete response
  • URLs automatically expire after 24 hours
  • No changes required to your API request structure
  • For documents under 70 pages, the API continues to return results directly

Implementation Tips:

  1. Check if response contains "is_url": true
  2. If true, fetch the complete document data from the provided URL
  3. Store URLs securely as they contain your processed results

Authorizations

x-api-key
string
header
required

API key for authentication

Body

application/json

Response

200
application/json

Successful extraction

The response is of type object.