> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Detect Form Fields

## Overview

<Info>
  Detect form fields on a PDF and return them as structured cells along with a reusable `form_id`. Returns a `FormResult` synchronously by default. Set `async: true` to run in the background and poll [GET /job/{'{'}jobId{'}'}](/api-reference/endpoint/poll) for the result.
</Info>

`/form/detect` is the entry point for the form-filler workflow when you want to inspect the fields Pulse identified on a PDF before filling or clearing them. Use it to preview detected fields, fix a misclassified cell, see which checkboxes are currently selected, or cache the detection result for repeated chained calls.

The returned `form_id` references the uploaded PDF and its detected layout, and can be passed back to any of `/form/detect`, `/form/fill`, or `/form/clear` as the single input source. Pulse will reuse the cached layout instead of re-detecting it.

### Providing the PDF

Provide the PDF in **exactly one** of the following ways:

* `form_id`: re-detect on a previously stored PDF (returned by an earlier `/form/detect`, `/form/fill`, or `/form/clear` call). Useful when chaining detect calls or refreshing layout after edits.
* `file_url`: public or presigned URL to a PDF.
* `file`: PDF uploaded inline with the request.

Sending more than one (or none) returns `400`.

<Note>
  All three input modes ride on the same `multipart/form-data` request body — that's how the SDKs send every call. JSON bodies (`Content-Type: application/json`) with `form_id` or `file_url` are still accepted server-side for backward compatibility, but the SDKs only model the multipart form.
</Note>

### Pricing

Billed at **1 credit per page** of the PDF being processed. Every response also returns a top-level `credits_used` for **this request** and a cumulative `plan_info.total_credits_used` snapshot for your organization.

***

## Request

### Request Body

| Field        | Type          | Required     | Description                                                                                                                           |
| ------------ | ------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------- |
| `form_id`    | string (uuid) | One of these | Re-detect on a previously stored PDF.                                                                                                 |
| `file_url`   | string (uri)  | One of these | Public or presigned URL of a PDF.                                                                                                     |
| `file`       | binary        | One of these | PDF uploaded inline with the request.                                                                                                 |
| `page_range` | string        | No           | 1-based page filter, for example `"1,3-5"`. Alias `pages` accepted.                                                                   |
| `async`      | boolean       | No           | When `true`, returns `{ job_id, status: "pending" }` immediately (HTTP 202) and processes the job in the background. Default `false`. |

***

## Response

### Sync (200): `FormResult`

When `async` is `false` (default), the call returns a `FormResult` body directly. Since `/form/detect` does not modify the PDF, neither `fields_filled` nor `fields_cleared` is present.

| Field          | Type                                                                               | Description                                                                                                                                                                                                                                           |
| -------------- | ---------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `form_id`      | string (uuid)                                                                      | ID of the form record produced by this run. Pass to a subsequent `/form/detect`, `/form/fill`, or `/form/clear` call.                                                                                                                                 |
| `page_count`   | integer                                                                            | Number of pages in the PDF.                                                                                                                                                                                                                           |
| `pdf_url`      | string (uri)                                                                       | URL to download the (unmodified) PDF binary. Always points at [GET /results/{'{'}jobId{'}'}/pdf](/api-reference/endpoint/poll). Requires the same auth as the rest of the API.                                                                        |
| `form_fields`  | array of [`FormCell`](/api-reference/endpoint/form-fill#response.body.form_fields) | Detected cells. Each carries a normalized `bounding_box`, a `type` (`text` / `checkbox` / `signature`), the current `text` content, and for checkbox cells a `checkbox_details[]` array with per-box center coordinates, selection state, and labels. |
| `credits_used` | number                                                                             | Credits consumed by this request (`1 × page_count`).                                                                                                                                                                                                  |
| `plan_info`    | object                                                                             | `{ tier, total_credits_used, pages_used }` cumulative billing snapshot for your organization (post-request).                                                                                                                                          |

```json theme={null}
{
  "form_id": "30fe08e1-922e-4012-9dfa-6aed0df430dc",
  "page_count": 6,
  "pdf_url": "https://api.runpulse.com/results/80690a27-ce39-4ad6-a1c7-70c7745238c3/pdf",
  "form_fields": [
    {
      "page_number": 1,
      "type": "text",
      "bounding_box": [0.044, 0.038, 0.222, 0.052],
      "text": "Name (as shown on your income tax return)"
    },
    {
      "page_number": 1,
      "type": "checkbox",
      "bounding_box": [0.118, 0.226, 0.634, 0.241],
      "text": "Individual/sole proprietor C corporation S corporation Partnership",
      "checkbox_details": [
        { "center_coord": [0.125, 0.232], "selected": false, "text": "Individual/sole proprietor" },
        { "center_coord": [0.300, 0.232], "selected": false, "text": "C corporation" },
        { "center_coord": [0.418, 0.232], "selected": false, "text": "S corporation" },
        { "center_coord": [0.535, 0.232], "selected": false, "text": "Partnership" }
      ]
    }
  ],
  "credits_used": 6.0,
  "plan_info": {
    "tier": "pulse_ultra_2",
    "total_credits_used": 1278.0,
    "pages_used": 426
  }
}
```

<Note>
  All cell coordinates (`bounding_box`, `checkbox_details[].center_coord`) are normalized to `[0, 1]` with a top-left origin. Multiply by your render width / height to convert to pixel coordinates.
</Note>

### Async (202): `FormJobAccepted`

When `async` is `true`:

```json theme={null}
{
  "job_id": "abc123-def456-ghi789",
  "status": "pending"
}
```

Poll [GET /job/{'{'}jobId{'}'}](/api-reference/endpoint/poll). The job's `result` carries the same `FormResult` shape that the sync flow would have returned inline.

### Status Codes

| Code | Description                                                             |
| ---- | ----------------------------------------------------------------------- |
| 200  | Detected layout returned synchronously.                                 |
| 202  | Async job accepted (`async: true`). Poll `/job/{jobId}` for the result. |
| 400  | Missing PDF or more than one PDF source provided.                       |
| 401  | Authentication failed or missing API key.                               |
| 404  | Referenced `form_id` not found (or belongs to a different org).         |
| 500  | Internal server error.                                                  |

***

## Example Usage

### Detect From URL

<CodeGroup>
  ```python Python theme={null}
  from pulse import Pulse

  client = Pulse(api_key="YOUR_API_KEY")

  result = client.form.detect(
      file_url="https://www.irs.gov/pub/irs-pdf/fw9.pdf",
  )

  print(f"form_id    : {result.form_id}")
  print(f"page_count : {result.page_count}")
  print(f"# cells    : {len(result.form_fields or [])}")
  print(f"credits    : {result.credits_used} (1 x {result.page_count} pages)")

  for cell in (result.form_fields or [])[:3]:
      print(f"  [{cell.type}] {cell.bounding_box}  {cell.text!r}")
  ```

  ```typescript TypeScript theme={null}
  import { PulseClient } from "pulse-ts-sdk";

  const client = new PulseClient({ apiKey: "YOUR_API_KEY" });

  const result = await client.form.detect({
      file_url: "https://www.irs.gov/pub/irs-pdf/fw9.pdf",
  });

  console.log(`form_id=${result.form_id}`);
  console.log(`page_count=${result.page_count}`);
  console.log(`# cells=${result.form_fields?.length ?? 0}`);
  ```

  ```bash curl theme={null}
  curl -X POST https://api.runpulse.com/form/detect \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"file_url": "https://www.irs.gov/pub/irs-pdf/fw9.pdf"}'
  ```
</CodeGroup>

### File Upload

<CodeGroup>
  ```python Python theme={null}
  with open("intake-form.pdf", "rb") as f:
      result = client.form.detect(file=f)
  ```

  ```typescript TypeScript theme={null}
  import * as fs from "fs";

  const fileBuffer = fs.readFileSync("intake-form.pdf");
  const blob = new Blob([fileBuffer], { type: "application/pdf" });

  const result = await client.form.detect({ file: blob });
  ```

  ```bash curl theme={null}
  curl -X POST https://api.runpulse.com/form/detect \
    -H "x-api-key: YOUR_API_KEY" \
    -F "file=@intake-form.pdf"
  ```
</CodeGroup>

### Detect, Edit, Then Fill

Detect the cells once, hand-edit any that were misclassified, and pass the edited cells back to [`/form/fill`](/api-reference/endpoint/form-fill) along with the cached `form_id`. The fill call reuses the cached layout instead of re-detecting it.

<CodeGroup>
  ```python Python theme={null}
  detect = client.form.detect(file_url="https://example.com/contract.pdf")

  # Re-tag a cell the detector got wrong
  edited = []
  for cell in detect.form_fields or []:
      if cell.text and cell.text.strip().lower() == "signature":
          cell.type = "signature"
      edited.append(cell)

  fill = client.form.fill(
      form_id=detect.form_id,
      instructions="Sign as Jane Doe, dated 2026-05-01.",
      form_fields=edited,
  )
  ```

  ```typescript TypeScript theme={null}
  const detect = await client.form.detect({
      file_url: "https://example.com/contract.pdf",
  });

  const edited = (detect.form_fields ?? []).map((cell) =>
      cell.text?.trim().toLowerCase() === "signature"
          ? { ...cell, type: "signature" as const }
          : cell,
  );

  const fill = await client.form.fill({
      form_id: detect.form_id!,
      instructions: "Sign as Jane Doe, dated 2026-05-01.",
      form_fields: edited,
  });
  ```
</CodeGroup>

### Re-detect On A Stored Form

Pass `form_id` (instead of `file_url` / `file`) to refresh the layout on a PDF already stored by Pulse. Useful after a `/form/clear` round-trip, or to grab the latest cells if you suspect drift.

<CodeGroup>
  ```python Python theme={null}
  fresh = client.form.detect(form_id="00e2c454-4e6f-429b-bd74-320ad94b2153")
  ```

  ```bash curl theme={null}
  curl -X POST https://api.runpulse.com/form/detect \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"form_id": "00e2c454-4e6f-429b-bd74-320ad94b2153"}'
  ```
</CodeGroup>


## OpenAPI

````yaml POST /form/detect
openapi: 3.1.0
info:
  title: Pulse API
  description: >-
    Production-grade document extraction service that transforms complex
    documents  into structured, AI-ready data. This specification is the single
    source of truth  for the Pulse extraction APIs.
  version: 1.0.0
  contact:
    name: Pulse Support
    email: support@trypulse.ai
    url: https://docs.runpulse.com
servers:
  - url: https://api.runpulse.com
    description: Production server
security:
  - ApiKeyAuth: []
paths:
  /form/detect:
    post:
      tags:
        - Forms
      summary: Detect Form Fields
      operationId: formDetect
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/FormDetectRequest'
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/FormDetectMultipart'
      responses:
        '200':
          description: Detected layout returned synchronously.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/FormResult'
        '202':
          description: >-
            Async job accepted (`async: true`). Poll `GET /job/{jobId}` for the
            result.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/FormJobAccepted'
        '400':
          $ref: '#/components/responses/BadRequest'
        '401':
          $ref: '#/components/responses/Unauthorized'
        '404':
          $ref: '#/components/responses/NotFound'
        '500':
          $ref: '#/components/responses/InternalServerError'
components:
  schemas:
    FormDetectRequest:
      type: object
      description: |
        JSON body for `POST /form/detect`. Provide exactly one of `form_id`
        or `file_url` (or use the multipart variant to upload a `file`).
      properties:
        form_id:
          type: string
          format: uuid
          description: |
            Re-detect cells on a previously stored PDF. Useful when chaining
            detect calls or refreshing layout after edits.
        file_url:
          type: string
          format: uri
          description: Public or presigned URL of a PDF to detect cells on.
      allOf:
        - $ref: '#/components/schemas/FormSharedOptions'
    FormDetectMultipart:
      type: object
      description: Multipart `/form/detect` request with a direct PDF upload.
      properties:
        file:
          type: string
          format: binary
          description: PDF file to upload directly.
        file_url:
          type: string
          format: uri
          description: >-
            Alternative to `file`: a public or presigned URL Pulse will
            download.
        form_id:
          type: string
          format: uuid
          description: Reference to a previously processed form.
        page_range:
          type: string
        async:
          type: string
          description: Set to `"true"` to run asynchronously.
    FormResult:
      type: object
      description: |
        Result body returned by `/form/detect`, `/form/fill`, and
        `/form/clear`. For async jobs (`async: true`) the same shape is
        served back under `result` on `GET /job/{jobId}`.
      required:
        - form_id
        - page_count
        - form_fields
      properties:
        form_id:
          type: string
          format: uuid
          description: |
            ID of the form record produced by this run. Pass to a subsequent
            `/form/detect`, `/form/fill`, or `/form/clear` call as the single
            input source to iterate without re-uploading the PDF.
        page_count:
          type: integer
          minimum: 1
          description: Number of pages in the output PDF.
        pdf_url:
          type: string
          description: |
            URL to download the resulting PDF binary. Always points at
            `GET /results/{jobId}/pdf` for the originating job. Requires the
            same authentication (API key or JWT) as the rest of the API.
        form_fields:
          type: array
          description: |
            Detected cells of the resulting PDF (refreshed from the
            filled / cleared output for fill / clear, or freshly detected for
            `/form/detect`).
          items:
            $ref: '#/components/schemas/FormCell'
        fields_filled:
          type: integer
          minimum: 0
          description: |
            Number of cells whose value actually changed during this run.
            Present on `/form/fill` responses only.
        fields_cleared:
          type: integer
          minimum: 0
          description: |
            Number of cells whose value actually changed during this run
            (no-op clears on already-empty fields are not counted).
            Present on `/form/clear` responses only.
        credits_used:
          type: number
          format: float
          description: |
            Credits consumed by **this request**. Detect charges 1 credit
            per page; fill and clear charge 3 credits per page.
        plan_info:
          $ref: '#/components/schemas/FormPlanInfo'
    FormJobAccepted:
      type: object
      description: 'Async submission acknowledgement returned when `async: true`.'
      required:
        - job_id
        - status
      properties:
        job_id:
          type: string
          format: uuid
          description: Identifier for `GET /job/{jobId}` polling.
        status:
          type: string
          enum:
            - pending
    FormSharedOptions:
      type: object
      description: Optional knobs accepted by all three form endpoints.
      properties:
        page_range:
          type: string
          description: |
            Restrict the operation to a subset of pages. Accepts
            comma-separated page numbers and ranges, e.g. `"1-3,5"`.
            Alias: `pages`.
        async:
          type: boolean
          default: false
          description: |
            When `true`, the endpoint returns immediately with
            `{ job_id, status: "pending" }` (HTTP 202) and processes the
            job in the background. Poll `GET /job/{jobId}` for the result.
    FormCell:
      type: object
      description: >
        A single detected cell on a form page. Cells are produced by Pulse

        cell detection during a `/form/detect`, `/form/fill`, or `/form/clear`

        call and may be passed back via `form_fields` to override what Pulse

        uses when filling or clearing.


        All coordinate fields (`bounding_box`,
        `checkbox_details[].center_coord`)

        are normalized to `[0, 1]` relative to the page width / height, so

        they're independent of the renderer's pixel resolution. `page_number`

        is 1-indexed.
      properties:
        page_number:
          type: integer
          minimum: 1
          description: 1-indexed page the cell belongs to.
        bounding_box:
          type: array
          description: |
            Cell bounding box `[x1, y1, x2, y2]` normalized to the range
            `[0, 1]` (top-left origin).
          items:
            type: number
            minimum: 0
            maximum: 1
          minItems: 4
          maxItems: 4
        text:
          type: string
          description: Current text content of the cell. Empty string for unfilled cells.
        type:
          type: string
          description: Detected cell type.
          enum:
            - text
            - checkbox
            - signature
        row:
          type: integer
          minimum: 0
          description: >-
            Row index inside the detected table this cell belongs to (when
            applicable).
        col:
          type: integer
          minimum: 0
          description: >-
            Column index inside the detected table this cell belongs to (when
            applicable).
        table_idx:
          type: integer
          minimum: 0
          description: >-
            Index of the detected table on the page this cell belongs to
            (0-indexed).
        checkbox_details:
          type: array
          description: |
            Only present when `type == "checkbox"`. One entry per individual
            checkbox inside this cell. A single cell often contains multiple
            checkboxes (for example a "Filing Status" row with Single / MFJ /
            MFS / HoH / QSS). The cell-level `text` is the raw line covering
            the whole cell; per-box labels live here on each detail.
          items:
            $ref: '#/components/schemas/FormCheckboxDetail'
    FormPlanInfo:
      type: object
      description: |
        Cumulative billing snapshot for the calling organization. Includes
        the in-flight request's contribution, so every response reflects
        post-request state.
      properties:
        tier:
          type: string
          description: Billing tier, e.g. `"trial"`, `"pulse_ultra_2"`.
        total_credits_used:
          type: number
          format: float
          description: |
            Total credits consumed by the organization to date, including
            this request. The primary billing metric.
        pages_used:
          type: integer
          minimum: 0
          description: |
            Total pages processed by the organization to date, including
            this request. Kept for backward compatibility with clients
            that haven't migrated to `total_credits_used`.
    ErrorResponse:
      type: object
      properties:
        error:
          type: object
          properties:
            code:
              type: string
              description: Error code (e.g., FILE_001, AUTH_002)
            message:
              type: string
              description: Human-readable error message
            details:
              type: object
              description: Additional error context
    FormCheckboxDetail:
      type: object
      description: |
        A single checkbox inside a `FormCell` of `type == "checkbox"`. Bundles
        the checkbox's normalized center coordinate, current selection state,
        and the per-box text label.
      properties:
        center_coord:
          type: array
          description: |
            Normalized `[x, y]` center of the checkbox (in `[0, 1]`, top-left
            origin), in the same coordinate system as `FormCell.bounding_box`.
          items:
            type: number
            minimum: 0
            maximum: 1
          minItems: 2
          maxItems: 2
        selected:
          type: boolean
          description: >-
            `true` when the checkbox is currently filled in (selected), `false`
            otherwise.
        text:
          type: string
          description: |
            Per-checkbox text label (e.g. `"Individual/sole proprietor"`,
            `"S corporation"`). May be an empty string when no label could be
            confidently associated with this specific box.
  responses:
    BadRequest:
      description: Bad request - Invalid parameters
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    Unauthorized:
      description: Unauthorized - Invalid or missing API key
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    NotFound:
      description: Resource not found
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
    InternalServerError:
      description: Internal server error
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key
      description: API key for authentication

````