Updates - Pulse API

2025-03-13

v0.1.6

Large Document Processing Enhancement

Documents with more than 70 pages now return a URL containing the complete response
URLs automatically expire after 24 hours for security
Improved reliability and performance for large document processing
Reduced memory usage and faster response times

Response Structure for Large Documents

{
  "is_url": true,
  "url": "https://pulse-studio-api.s3.region.amazonaws.com/results/...",
  "plan-info": {
    "pages_used": 0,
    "tier": "foundation"
  }
}

Key Updates

Applies to both /extract and /extract_async endpoints
No changes required to your API request structure
Documents under 70 pages continue to return results directly
Significantly improved reliability for large document processing

Best Practices

Use page selection parameter to process only needed pages
Consider breaking very large documents into logical sections
Store document URLs securely - they contain your processed results
Check if responses contain "is_url": true in your code

2025-01-14

v0.1.5

Batch Processing Endpoint

Introduced /batch_extract_async endpoint for parallel document processing
Process multiple documents with customized parameters in one API call
Support for both file URLs and file paths in batch requests
Integrated with existing /job/<job_id> polling endpoint

Example Request Structure

{
  "files": [
    {
      "file_path": "path/to/document1.pdf",
      "return_table": true,
      "experimental_return_table": true,
      "chunking": ["page", "semantic"]
    },
    {
      "file_path": "path/to/document2.pdf",
      "return_table": true,
      "chunking": ["page"]
    }
  ]
}

Full documentation available at docs.runpulse.com

2025-01-13

v0.1.4

Cloud-Agnostic Processing & Table Improvements

Universal presigned URL support (AWS, GCP, Azure)
Direct support for any valid presigned URL from major cloud providers
Support for direct PDF links without cloud storage

Table Processing Enhancements

New experimental_return_table parameter (beta)
Improved cell boundary detection
Better handling of merged cells
Original return_table parameter maintained for production stability

Schema Processing Improvements

Resolved token limit truncation in large document processing
Optimized processing pipeline for faster schema generation
Improved handling of nested data structures

Important Notes

experimental_return_table is in beta
Processing time estimates now reflect real-world processing times
All existing endpoints and features remain fully supported
Schema processing improvements are automatically applied

2024-12-11

v0.1.3

Async Processing Update

New asynchronous processing capabilities for large documents
Job status monitoring and management
Cancel in-progress jobs
All existing endpoints and features maintained

New Features

New /extract_async endpoint for processing large documents
Immediate job ID response for status tracking
Progress monitoring through /job/<job_id> endpoint
Cancel long-running jobs with /cancel/<job_id>

Response Examples

{
  "job_id": "3569370a-fce2-44b0-8300-67a79ce58330",
  "status": "pending"  // statuses: pending, completed, failed, cancelled
}

{
  "created_at": "2024-12-11T06:31:28.870782+00:00",
  "estimated_completion_time": "2024-12-11T06:36:28.870782+00:00",
  "job_id": "<JOB_ID>",
  "progress": 21.0,
  "status": "processing"
}

2024-12-05

v0.1.2

Post-Thanksgiving Feature Release

Enhanced schema processing engine - more accurate field extraction
New domain & API endpoint: api.runpulse.com (existing endpoint still supported)
New table extraction capabilities
Improved layout recognition for complex documents

Direct Table Extraction

New return_table parameter returns tables in 2D matrix format
Perfect for financial documents, rent rolls, and structured data
Clean, processed tables ready for data analysis

Schema & Response Changes

Better field type handling (string, float, time)
More accurate extraction of nested data
Specify schema to receive “schema-json” in response

{
  "markdown": "...",
  "tables": [...],
  "schema-json": {...},
  "plan-info": {
    "note": "Page limit exceeded. Your foundation tier limit is 20000 pages...",
    "pages_used": 20924,
    "tier": "foundation"
  }
}

Resources & Support

Full documentation: docs.runpulse.com
Questions? Email our founders directly: founders@trypulse.ai

2024-11-27

v0.1.1

Pulse API Launch!

Key Update

Renaming /upload to /extract for documents requiring custom extraction patterns and high volume processing
/convert endpoint unchanged for quick PDF processing & URL generation

Performance Improvements

2x faster document processing
Significantly improved table extraction accuracy
Enhanced layout recognition for complex documents
All existing parameters and schemas work exactly the same

Custom Schema Power

Extract exactly what you need with flexible JSON schemas
Design any extraction pattern - from simple text to complex nested structures

{
  "project_title": "string",
  "introduction_background": "string",
  "objectives": "string",
  "scope_of_work": "string"
}

Coming Soon

New parameter to receive all detected tables as separate CSV files
More extraction capabilities rolling out weekly

Working with Tables

Need to convert extracted tables to matrix format? Here’s a helper script:

import re

def extract_tables(markdown):
    table_pattern = r"(\|.*?\|\n(\|[-:]+[-|:]+\|\n)?(?:\|.*?\|\n)+)"
    tables = re.findall(table_pattern, markdown, re.DOTALL)
    return [match[0] for match in tables]

def markdown_to_matrix(markdown):
    lines = markdown.strip().split("\n")
    matrix = []
    for i, line in enumerate(lines):
        if i == 1:
            continue
        row = [cell.strip() for cell in line.split('|') if cell.strip()]
        if row:
            matrix.append(row)
    for row in matrix[1:]:
        for j in range(1, len(row)):
            row[j] = convert_to_number(row[j])
    return matrix

def convert_to_number(value):
    value = value.replace(",", "").replace("%", "").replace("$", "")
    try:
        if "." in value or "e" in value.lower():
            return float(value)
        return int(value)
    except ValueError:
        return value

Resources & Support

Full documentation: docs.trypulse.ai
Questions? Email our founders directly: founders@trypulse.ai

Keep extracting amazing things!

Pulse API

​Large Document Processing Enhancement

​Response Structure for Large Documents

​Key Updates

​Best Practices

​Batch Processing Endpoint

​Example Request Structure

​Cloud-Agnostic Processing & Table Improvements

​Table Processing Enhancements

​Schema Processing Improvements

​Important Notes

​Async Processing Update

​New Features

​Response Examples

​Post-Thanksgiving Feature Release

​Direct Table Extraction

​Schema & Response Changes

​Resources & Support

​Pulse API Launch!

​Key Update

​Performance Improvements

​Custom Schema Power

​Coming Soon

​Working with Tables

​Resources & Support

Large Document Processing Enhancement

Response Structure for Large Documents

Key Updates

Best Practices

Batch Processing Endpoint

Example Request Structure

Cloud-Agnostic Processing & Table Improvements

Table Processing Enhancements

Schema Processing Improvements

Important Notes

Async Processing Update

New Features

Response Examples

Post-Thanksgiving Feature Release

Direct Table Extraction

Schema & Response Changes

Resources & Support

Pulse API Launch!

Key Update

Performance Improvements

Custom Schema Power

Coming Soon

Working with Tables

Resources & Support