# Extend AI Platform — CLAUDE.md

> Context file for AI coding assistants building on the [Extend](https://docs.extend.ai) document processing platform.

## What is Extend?

Extend is a platform for building, evaluating, and deploying AI-powered document processing. It provides APIs and SDKs for:

- **Extraction** — Pull structured data from documents using a JSON Schema
- **Classification** — Categorize documents by type
- **Splitting** — Divide multi-page documents into sections
- **Parsing** — Convert documents into clean, structured text (markdown, etc.)
- **Editing** — Detect and fill PDF form fields
- **Workflows** — Orchestrate multiple processors into pipelines with conditionals, human review, webhooks, and more

Full documentation: https://docs.extend.ai
Searchable docs index: https://docs.extend.ai/llms.txt

---

## Authentication

All API requests require Bearer token authentication and an API version header. **If using an SDK, authentication and versioning are handled automatically — the details below apply to raw HTTP requests only.**

```bash
curl -X POST "https://api.extend.ai/extract" \
  -H "Authorization: Bearer sk_YOUR_API_KEY" \
  -H "x-extend-api-version: 2026-02-09" \
  -H "Content-Type: application/json" \
  -d '{ ... }'
```

| Header | Value | Required |
|--------|-------|----------|
| `Authorization` | `Bearer sk_...` | Yes |
| `x-extend-api-version` | `2026-02-09` (latest) | Yes |
| `Content-Type` | `application/json` | For POST/PUT |

Get your API key from the [Extend dashboard](https://app.extend.ai) under Developer Settings.

**Omitting `x-extend-api-version` on raw HTTP requests returns an error.** SDKs set this automatically.

---

## API Versions

The API is versioned by date via the `x-extend-api-version` header. The latest version is `2026-02-09`. SDKs target the correct version automatically when kept up to date.

| Version | Status | Notes |
|---------|--------|-------|
| `2026-02-09` | **Current** | Resource-based endpoints, typed IDs, sync support, simplified responses |
| `2025-04-21` | Stable | Granular processor control |
| `2024-12-23` | Legacy | Separate EXCEL handling |
| `2024-07-30` | Legacy | Webhook subscriptions, processor management |

**If you are on an older version**, see the [migration guide](https://docs.extend.ai/developers/migrations/2026-02-09/overview) for breaking changes in `2026-02-09`. Key changes:

- **Dedicated endpoints** per resource type (`/extract`, `/classify`, `/split`) replace the generic `/processor_runs` endpoint
- **New ID prefixes**: extractors use `ex_`, extract runs use `exr_`, classifiers use `cl_`, splitters use `sp_`
- **Synchronous processing** support on all endpoints (new `/extract`, `/classify`, `/split` sync endpoints)
- **Simplified responses**: single objects no longer wrapped in containers; list responses standardized to `{ "object": "list", "data": [...] }`
- **Inline configuration**: pass extractor/classifier/splitter config inline without pre-creating a resource — useful for managing schemas entirely in code
- **SDK polling helpers**: `createAndPoll` / `create_and_poll` methods with exponential backoff built into updated SDKs

**Migration path**: Update your SDK to the latest version (automatically targets `2026-02-09`), then migrate endpoint-by-endpoint. The old `/processor_runs` and `/processors` endpoints still work on older API versions but are now under Legacy in the docs.

Docs: https://docs.extend.ai/developers/api-versioning

---

## SDKs

**Official SDKs** are available for TypeScript, Python, and Java.

**TypeScript:**
```bash
npm install extend-ai
```

**Python:**
```bash
pip install extend-ai
```

**Java (Gradle):**
```gradle
dependencies {
  implementation 'ai.extend:extend-java-sdk'
}
```

**Community SDK:**
- **Haskell** — maintained by Mercury Technologies: https://github.com/MercuryTechnologies/extend

All SDKs include polling helpers (`createAndPoll` / `create_and_poll`) for async operations, and webhook signature verification utilities.

Docs: https://docs.extend.ai/developers/sdks

---

## API Endpoints (2026-02-09)

> **Note on SDK method names vs REST paths:** This document describes the REST API. SDK method names follow language conventions and may differ (e.g., REST `POST /extract_runs` maps to Python `client.extract_runs.create()` and TypeScript `client.extractRuns.create()`). Always confirm exact method signatures against the SDK source or docs when writing code.

### Base URL

| Region | URL |
|--------|-----|
| US1 (default) | `https://api.extend.ai` |
| US2 | `https://api.us2.extend.app` |

SDKs accept a `baseUrl` (TypeScript) or `base_url` (Python) parameter to select the region.

### Files

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/files/upload` | Upload a file (multipart form data) |
| GET | `/files/{id}` | Get file metadata + presigned download URL |
| GET | `/files` | List files |
| DELETE | `/files/{id}` | Delete a file |

### Extract

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/extract` | Extract data (sync, 5-min timeout) |
| POST | `/extract_runs` | Extract data (async) |
| GET | `/extract_runs/{id}` | Get extract run status/output |
| GET | `/extract_runs` | List extract runs |
| DELETE | `/extract_runs/{id}` | Delete an extract run |
| POST | `/extract_runs/{id}/cancel` | Cancel an in-progress run |
| POST | `/extractors` | Create an extractor |
| GET | `/extractors/{id}` | Get extractor details |
| POST | `/extractors/{id}` | Update an extractor |
| GET | `/extractors` | List extractors |
| POST | `/extractors/{extractorId}/versions` | Publish a new version |
| GET | `/extractors/{extractorId}/versions/{versionId}` | Get a version |
| GET | `/extractors/{extractorId}/versions` | List versions |

### Classify

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/classify` | Classify a file (sync, 5-min timeout) |
| POST | `/classify_runs` | Classify a file (async) |
| GET | `/classify_runs/{id}` | Get classify run |
| GET | `/classify_runs` | List classify runs |
| DELETE | `/classify_runs/{id}` | Delete a classify run |
| POST | `/classify_runs/{id}/cancel` | Cancel an in-progress run |
| POST | `/classifiers` | Create a classifier |
| GET | `/classifiers/{id}` | Get classifier |
| POST | `/classifiers/{id}` | Update classifier |
| GET | `/classifiers` | List classifiers |
| POST | `/classifiers/{classifierId}/versions` | Publish a new version |
| GET | `/classifiers/{classifierId}/versions/{versionId}` | Get a version |
| GET | `/classifiers/{classifierId}/versions` | List versions |

### Split

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/split` | Split a file (sync, 5-min timeout) |
| POST | `/split_runs` | Split a file (async) |
| GET | `/split_runs/{id}` | Get split run |
| GET | `/split_runs` | List split runs |
| DELETE | `/split_runs/{id}` | Delete a split run |
| POST | `/split_runs/{id}/cancel` | Cancel an in-progress run |
| POST | `/splitters` | Create a splitter |
| GET | `/splitters/{id}` | Get splitter |
| POST | `/splitters/{id}` | Update splitter |
| GET | `/splitters` | List splitters |
| POST | `/splitters/{splitterId}/versions` | Publish a new version |
| GET | `/splitters/{splitterId}/versions/{versionId}` | Get a version |
| GET | `/splitters/{splitterId}/versions` | List versions |

### Parse

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/parse` | Parse a file (sync, 5-min timeout) |
| POST | `/parse_runs` | Parse a file (async) |
| GET | `/parse_runs/{id}` | Get parse run |
| DELETE | `/parse_runs/{id}` | Delete a parse run |

### Edit

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/edit` | Edit a PDF (sync, 5-min timeout) |
| POST | `/edit_runs` | Edit a PDF (async) |
| GET | `/edit_runs/{id}` | Get edit run |
| DELETE | `/edit_runs/{id}` | Delete an edit run |

### Workflows

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/workflow_runs` | Run a workflow |
| POST | `/workflow_runs/batch` | Batch run a workflow |
| GET | `/workflow_runs/{id}` | Get workflow run |
| POST | `/workflow_runs/{id}` | Update workflow run metadata |
| POST | `/workflow_runs/{id}/cancel` | Cancel a workflow run |
| DELETE | `/workflow_runs/{id}` | Delete a workflow run |
| GET | `/workflow_runs` | List workflow runs |
| POST | `/workflows` | Create a workflow |

### Evaluation Sets

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/evaluation_sets` | Create an evaluation set |
| GET | `/evaluation_sets/{id}` | Get an evaluation set |
| GET | `/evaluation_sets` | List evaluation sets |
| POST | `/evaluation_sets/{id}/items` | Create items |
| POST | `/evaluation_sets/{id}/items/bulk` | Bulk create items |
| GET | `/evaluation_sets/{id}/items/{itemId}` | Get an item |
| PATCH | `/evaluation_sets/{id}/items/{itemId}` | Update an item |
| DELETE | `/evaluation_sets/{id}/items/{itemId}` | Delete an item |
| GET | `/evaluation_sets/{id}/items` | List items |
| GET | `/evaluation_sets/{id}/runs/{runId}` | Get an eval run |

---

## Common Patterns

### Extract (sync) — Python

```python
from extend_ai import Extend

client = Extend(token="sk_...")

# Sync extract — blocks until complete (5-min timeout)
result = client.extract(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_..."},
)
print(result.output)
```

### Extract (async with polling) — Python

```python
result = client.extract_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_..."},
)
print(result.status)  # "PROCESSED"
print(result.output)
```

### Extract (sync) — TypeScript

```typescript
import { ExtendClient } from "extend-ai";

const client = new ExtendClient({ token: "sk_..." });

const result = await client.extract({
  file: { url: "https://example.com/invoice.pdf" },
  extractor: { id: "ex_..." },
});
console.log(result.output);
```

### Extract (async with polling) — TypeScript

```typescript
const result = await client.extractRuns.createAndPoll({
  file: { url: "https://example.com/invoice.pdf" },
  extractor: { id: "ex_..." },
});
console.log(result.status); // "PROCESSED"
console.log(result.output);
```

### Typed extraction with Zod — TypeScript

The TypeScript SDK supports inline Zod schemas with full type inference:

```typescript
import { ExtendClient, extendDate, extendCurrency } from "extend-ai";
import { z } from "zod";

const client = new ExtendClient({ token: "sk_..." });

const result = await client.extract({
  file: { url: "https://example.com/invoice.pdf" },
  config: {
    schema: z.object({
      invoice_number: z.string().nullable().describe("The invoice number"),
      invoice_date: extendDate().describe("The invoice date"),
      line_items: z.array(z.object({
        description: z.string().nullable(),
        amount: extendCurrency(),
      })).describe("Line items on the invoice"),
      total: extendCurrency().describe("Total amount due"),
    }),
  },
});

console.log(result.output.value.invoice_number); // string | null
console.log(result.output.value.total.amount);   // number | null
```

### Parse a document — Python

```python
result = client.parse(file={"url": "https://example.com/doc.pdf"})
for chunk in result.output.chunks:
    print(chunk.content)
```

### Parse (async with polling) — Python

```python
result = client.parse_runs.create_and_poll(
    file={"url": "https://example.com/doc.pdf"},
)
for chunk in result.output.chunks:
    print(chunk.content)
```

### Run a workflow — Python

```python
result = client.workflow_runs.create_and_poll(
    file={"url": "https://example.com/doc.pdf"},
    workflow={"id": "workflow_..."},
)
for step_run in result.step_runs or []:
    print(step_run.step.type)
    print(step_run.result)
```

### Run a workflow — TypeScript

```typescript
const result = await client.workflowRuns.createAndPoll({
  file: { url: "https://example.com/doc.pdf" },
  workflow: { id: "workflow_..." },
});

for (const stepRun of result.stepRuns ?? []) {
  console.log(stepRun.step.type);
  console.log(stepRun.result);
}
```

---

## Sync vs Async Processing

All processing endpoints (extract, classify, split, parse, edit) support both sync and async modes. Workflows are async-only.

- **Sync** (`POST /extract` / SDK: `client.extract()`) — Blocks until complete. Has a **5-minute timeout**. Best for testing and small files.
- **Async** (`POST /extract_runs` / SDK: `client.extractRuns.createAndPoll()` or `client.extract_runs.create_and_poll()`) — Returns immediately with a run ID. Poll with `GET /extract_runs/{id}` or use webhooks. No timeout limit.

**Use async for production workloads.** Large documents can exceed the 5-minute sync timeout. SDK `createAndPoll` / `create_and_poll` methods are the recommended approach — they handle polling automatically with built-in backoff.

SDK polling helpers use a hybrid strategy: fast polling for 30 seconds, then gradual backoff up to 30-second intervals.

Terminal statuses: `PROCESSED`, `FAILED`, `CANCELLED` (also `NEEDS_REVIEW`, `REJECTED` for workflows).

Docs: https://docs.extend.ai/developers/async-processing

---

## Extraction Schema (JSON Schema)

Extractors use JSON Schema to define output structure. Key rules:

- **Root must be `"type": "object"`**
- **All primitive fields must be nullable**: use `"type": ["string", "null"]` not `"type": "string"`
- **Objects and arrays cannot be nullable**
- **Max nesting depth**: 5 levels
- **Property names**: letters, numbers, underscores, hyphens only
- Include `"required"` arrays listing every property
- Include `"additionalProperties": false` on all objects

### Supported types

| JSON Schema Type | Notes |
|-----------------|-------|
| `["string", "null"]` | Nullable string |
| `["number", "null"]` | Nullable number |
| `["integer", "null"]` | Nullable integer |
| `["boolean", "null"]` | Nullable boolean |
| `"object"` | Nested object (not nullable) |
| `"array"` | Array of objects or scalars (not nullable) |

### Special Extend types

| Type | Usage | Output format |
|------|-------|---------------|
| `"extend:type": "date"` | Add to string fields | `yyyy-mm-dd` |
| `"extend:type": "currency"` | Object with `amount` + `iso_4217_currency_code` | Structured currency |
| `"extend:type": "signature"` | Object with `printed_name`, `signature_date`, `is_signed`, `title_or_role` | Signature detection |

### Enums

Enums must include `null` and only support string values. Use `"extend:descriptions"` for disambiguation:

```json
{
  "status": {
    "enum": ["active", "inactive", "pending", null],
    "extend:descriptions": ["Currently active", "No longer active", "Awaiting activation"]
  }
}
```

### Field descriptions

Use `"description"` to guide extraction. Use `"extend:name"` for display names without changing output keys.

### Unsupported

`anyOf`, `oneOf`, `allOf`, regex patterns, conditional schemas, `const`.

Docs: https://docs.extend.ai/product/extraction/schema

### Legacy: Fields Array schema

Extractors created before April 2025 may use the legacy "Fields Array" configuration instead of JSON Schema. Key differences:

- **Fields Array** used a `fields` array with `id`, `name`, `type`, `description` per field. Output mixed data and metadata together within each field object.
- **JSON Schema** uses a standard `schema` object. Output cleanly separates `value` (extracted data) from `metadata` (confidence, citations) using path-based keys.

**To migrate**: Open your processor in Studio, click the three-dot menu, select "Migrate to JSON Schema." This creates a new processor with the converted schema while preserving your original.

New extractors should always use JSON Schema. See the [migration guide](https://docs.extend.ai/product/migrating-to-json-schema) for full details.

---

## Webhooks

Webhooks deliver HTTP POST notifications when processing events complete.

### Setup

1. Create an endpoint in the Extend dashboard under Developers > Webhook Endpoints
2. Subscribe to events at global, workflow, or processor scope
3. Choose delivery format: JSON (default) or Signed Download URL (for large payloads)

### Key events

The table below lists common events. For the full list (including edit, lifecycle, and CRUD events for all resource types), see the [webhook events docs](https://docs.extend.ai/product/webhooks/events).

| Event | Fires when |
|-------|-----------|
| `extract_run.processed` | Extraction completes |
| `extract_run.failed` | Extraction fails |
| `classify_run.processed` | Classification completes |
| `classify_run.failed` | Classification fails |
| `split_run.processed` | Splitting completes |
| `split_run.failed` | Splitting fails |
| `parse_run.processed` | Parsing completes |
| `parse_run.failed` | Parsing fails |
| `edit_run.processed` | PDF editing completes |
| `edit_run.failed` | PDF editing fails |
| `workflow_run.completed` | Workflow completes |
| `workflow_run.failed` | Workflow fails |
| `workflow_run.needs_review` | Workflow requires human review |
| `workflow_run.step_run.processed` | Individual workflow step completes |

### Signature verification

Extend signs every webhook with HMAC-SHA256. Use the SDK's built-in verification:

**TypeScript:**
```typescript
const event = client.webhooks.verifyAndParse(body, headers, "wss_...");
```

**Python:**
```python
event = client.webhooks.verify_and_parse(body=body, headers=headers, signing_secret="wss_...")
```

For manual verification:
1. Extract `x-extend-request-timestamp` and `x-extend-request-signature` headers
2. Construct message: `v0:{timestamp}:{body}`
3. HMAC-SHA256 with your signing secret
4. Compare signatures; reject if timestamp > 5 minutes old

Docs: https://docs.extend.ai/product/webhooks/configuration

---

## Workflows

Workflows chain processors into pipelines. Built visually in the Extend Studio, triggered via API.

### Capabilities

- Extraction, classification, splitting steps
- Conditional routing based on extracted values or classification results
- Human review steps (pauses workflow for manual review)
- External data validation (call your API mid-workflow)
- Webhook response steps
- Formula calculations
- Parse step configuration
- Validation rules

### Running a workflow via API

Via SDK, use `client.workflowRuns.createAndPoll()` (TypeScript) or `client.workflow_runs.create_and_poll()` (Python) — see Common Patterns above. Raw HTTP example:

```bash
curl -X POST "https://api.extend.ai/workflow_runs" \
  -H "Authorization: Bearer sk_..." \
  -H "x-extend-api-version: 2026-02-09" \
  -H "Content-Type: application/json" \
  -d '{
    "workflowId": "workflow_...",
    "files": [{"url": "https://...", "fileName": "doc.pdf"}]
  }'
```

### Workflow run statuses

| Status | Meaning |
|--------|---------|
| `PENDING` | Queued, not yet started |
| `PROCESSING` | Currently executing |
| `PROCESSED` | Completed successfully |
| `FAILED` | Failed (check `failureReason`) |
| `NEEDS_REVIEW` | Paused for human review |
| `REJECTED` | Rejected during human review |
| `CANCELLED` | Cancelled via API |

### Retryable failure reasons

These failures are transient and safe to retry automatically:
- `INTERNAL_ERROR` — Unexpected server error
- `DOCUMENT_PROCESSOR_ERROR` — Extraction step failed after retries

Non-retryable:
- `INVALID_WORKFLOW` — Workflow configuration error
- `FAILED_TO_PROCESS_FILE` — File could not be downloaded (check your URL)

Docs: https://docs.extend.ai/product/workflows/create-a-workflow

---

## Error Handling

| Error Code | Description | Retryable |
|------------|-------------|-----------|
| `INVALID_REQUEST` | Bad request body or parameters | No |
| `UNAUTHORIZED` | Missing or invalid API key | No |
| `NOT_FOUND` | Resource doesn't exist | No |
| `RATE_LIMIT_EXCEEDED` | Too many requests — back off and retry | Yes |
| `USAGE_BLOCKED` | Out of credits | No |
| `ENDPOINT_REMOVED` | Deprecated endpoint — check error message for replacement | No |
| `INTERNAL_ERROR` | Server error | Yes |

SDKs raise typed exceptions for these errors (e.g., `RateLimitError`, `UnauthorizedError`). Error responses include a `requestId` — provide this when contacting support.

Docs: https://docs.extend.ai/developers/error-codes

---

## Rate Limits

All rate limits are per-organization. If you receive `429 Too Many Requests`, implement exponential backoff. SDK polling helpers handle backoff automatically; for other SDK calls, add your own retry logic.

Docs: https://docs.extend.ai/product/rate-limits (includes current limits by endpoint)

---

## Evaluation Sets

Evaluation sets let you benchmark processor accuracy against ground truth.

1. Create an eval set linked to an extractor
2. Add items (files + expected outputs)
3. Run the eval set against a processor version
4. Review per-field accuracy metrics

Available via both the Studio UI and the API.

Docs: https://docs.extend.ai/product/evaluation/overview

---

## Key Documentation Links

| Topic | URL |
|-------|-----|
| Getting started | https://docs.extend.ai/product/getting-started |
| Extraction quick start | https://docs.extend.ai/product/extraction/quick-start-5-minutes |
| Parsing quick start | https://docs.extend.ai/product/parsing/parse |
| JSON Schema reference | https://docs.extend.ai/product/extraction/schema |
| Extraction best practices | https://docs.extend.ai/product/extraction/best-practices/overview |
| Async processing | https://docs.extend.ai/developers/async-processing |
| Webhook setup | https://docs.extend.ai/product/webhooks/configuration |
| Webhook events | https://docs.extend.ai/product/webhooks/events |
| Workflow creation | https://docs.extend.ai/product/workflows/create-a-workflow |
| API versioning | https://docs.extend.ai/developers/api-versioning |
| 2026-02-09 migration | https://docs.extend.ai/developers/migrations/2026-02-09/overview |
| JSON Schema migration | https://docs.extend.ai/product/migrating-to-json-schema |
| SDKs | https://docs.extend.ai/developers/sdks |
| Error codes | https://docs.extend.ai/developers/error-codes |
| Rate limits | https://docs.extend.ai/product/rate-limits |
| Supported file types | https://docs.extend.ai/product/supported-file-types |
| Credits | https://docs.extend.ai/product/credits |
| Confidence scores | https://docs.extend.ai/product/extraction/confidence-scores |
| Citations | https://docs.extend.ai/product/extraction/citations |
| API reference (full) | https://docs.extend.ai/developers/api-reference |
| Searchable docs index | https://docs.extend.ai/llms.txt |