
# PDF tools

Twenty-two PDF operations, one endpoint, one uniform request/response shape. Every op is a `POST https://api.relaystation.ai/v1/pdf/<op>`. The pure-PDF ops (merge … metadata, images, diff, bookmarks, attachments) run in-process on pdf-lib / pdf.js; the heavier ops run on dedicated engines — encrypt, decrypt, compress, repair, render, OCR, ocr-searchable, and verify-signatures on vendored **qpdf**, **poppler** (including `pdfsig`), and **Tesseract**; **from-html (HTML→PDF) on a vendored headless Chromium**; and **from-office (Office→PDF) on LibreOffice (Gotenberg)** — same request shape, same per-call billing. `from-html` **generates** a PDF from scratch (an invoice, a report, a certificate); `from-office` **converts** an existing office document (docx, xlsx, pptx, …) into one.

## Inputs and outputs — the uniform shape

Every op takes its file (or files) as an **input source** and returns a uniform **output envelope**. You never stream a raw file; the body is always JSON.

**Input source** — one of two shapes per file field:

```json
{ "inline": "<base64-encoded PDF>" }          // for files ≤ 4 MB
{ "inputKey": "<scratch object key>" }         // for larger files
```

For files above the 4 MB inline ceiling, mint a one-time presigned upload first:

```bash
curl -X POST https://api.relaystation.ai/v1/cputools/upload-url \
  -H 'Authorization: Bearer rs_live_<key>' \
  -H 'Content-Type: application/json' \
  -d '{"ext":"pdf","contentType":"application/pdf"}'
# → { "url": "...", "fields": { ... }, "inputKey": "...", "expiresAt": "...", "maxBytes": 52428800 }
```

Then upload with a **multipart/form-data `POST` to `url`** — include every entry from `fields` as a form field, then a `file` field carrying the bytes (the standard S3 presigned-POST form). Finally pass `{"inputKey":"..."}` as the file to the op route. The presigned POST is customer-scoped, short-lived, and size-capped at `maxBytes` (S3 rejects an oversized body at upload time). An x402 lodestone wallet can mint one too — minting is free; only the op that consumes the bytes bills.

**Output envelope** — always JSON, under an `output` key:

```json
{
  "output": {
    "inline": "<base64>",            // present when the result ≤ 4 MB
    "outputKey": "...",              // present when the result > 4 MB
    "outputUrl": "https://...",      // presigned GET for the scratch key
    "sizeBytes": 12345,
    "contentType": "application/pdf",
    "filename": "merged.pdf"
  }
}
```

The inline-vs-reference threshold is operator-tunable (`cputools.io.max_inline_bytes`, default 4 MB). Results at or under it come back inline; larger results are written to a scratch key and returned as a presigned `outputUrl`. The full inline / scratch / baton model is the [persistence tiers](/docs/persistence-tiers) page; recipes for downloading or chaining outputs are in [Receiving outputs](/docs/receiving-outputs).

## Billing

Per-page ops bill the page count × the per-page rate, **minimum one page**. `metadata` is a flat per-call charge. The per-op rates are operator-tunable (`cputools.price.pdf.<op>.per_page_micros`); the launch defaults:

| Op | Billed on | Rate |
|---|---|---|
| `merge` | output pages | $0.001 / page |
| `split` | source pages | $0.001 / page |
| `rotate` | result pages | $0.001 / page |
| `pages` | result pages | $0.001 / page |
| `watermark` | pages stamped | $0.001 / page |
| `form` | pages | $0.001 / page |
| `extract-text` | source pages | $0.002 / page |
| `metadata` | per call | $0.001 flat |
| `encrypt` | pages | $0.001 / page |
| `decrypt` | pages | $0.001 / page |
| `compress` | pages | $0.001 / page |
| `repair` | pages | $0.001 / page |
| `render` | output pages | $0.001 / page |
| `ocr` | source pages | $0.003 / page |
| `ocr-searchable` | source pages | $0.004 / page |
| `verify-signatures` | per call | $0.001 flat |
| `from-html` | per call | $0.003 flat |
| `from-office` | per call | $0.005 flat |
| `images` | pages scanned | $0.001 / page |
| `diff` | combined pages of both inputs | $0.001 / page |
| `bookmarks` | per call | $0.001 flat |
| `attachments` | per call | $0.001 flat |

These are launch defaults; the live `402` challenge is authoritative (an unauthenticated `POST` returns the exact price). Every billable call needs an `Idempotency-Key` header — omit it and the chassis generates one per call.

## merge

Combine 2–50 PDFs into one, in the order given.

```json
POST /v1/pdf/merge
{ "files": [ {"inline":"<b64>"}, {"inputKey":"..."} ], "filename": "combined.pdf" }
```

## split

Split one PDF by 1-based ranges (`"1-3,5,8-10"`) into separate documents, or `burst: true` to explode every page into its own single-page PDF. Omit `ranges` (or pass `"all"`) for the whole document.

```json
POST /v1/pdf/split
{ "file": {"inline":"<b64>"}, "ranges": "1-3,7-9" }
```

Split produces multiple files, so it returns a **manifest** (not the single output envelope): each entry is a storage reference you can re-submit as an `inputKey` to another op.

```json
{ "files": [ {"index": 0, "outputKey": "...", "outputUrl": "https://...", "pages": 3, "sizeBytes": 12345} ] }
```

## rotate

Rotate pages by 90 / 180 / 270 degrees. `pages` is an optional 1-based range string; omit it to rotate every page.

```json
POST /v1/pdf/rotate
{ "file": {"inline":"<b64>"}, "degrees": "90", "pages": "1,3-5" }
```

## pages

Edit page structure. One of three actions:

- `delete` — drop a range of pages: `{"action":"delete","file":{...},"pages":"2,4-6"}`
- `reorder` — supply the new 1-based page order: `{"action":"reorder","file":{...},"order":[3,1,2]}`
- `insert` — splice another PDF in at a 1-based position (`at: 1` inserts before the first page; `at: N+1` appends): `{"action":"insert","file":{...},"insert":{...},"at":2}`

## watermark

Stamp text on every page (or a `pages` range). Tune `opacity` (0–1), `size` (pt, ≤ 400), `color` (`#RRGGBB`), and `position` (`center` | `top-left` | `top-right` | `bottom-left` | `bottom-right`).

```json
POST /v1/pdf/watermark
{ "file": {"inline":"<b64>"}, "text": "CONFIDENTIAL", "opacity": 0.2, "position": "center" }
```

## form

Fill or flatten an AcroForm. Pass `fields` as a map of field name → string or boolean (checkbox); set `flatten: true` to bake the values in so the form is no longer editable.

```json
POST /v1/pdf/form
{ "file": {"inline":"<b64>"}, "fields": {"name":"Ada Lovelace","subscribe":true}, "flatten": true }
```

## extract-text

Pull text + structure out of a PDF (engine: pdf.js / unpdf). Returns per-page text plus the concatenated whole. `pages` is an optional 1-based range string.

```json
POST /v1/pdf/extract-text
{ "file": {"inline":"<b64>"}, "pages": "1-5" }
```

Response (not an output envelope — this op returns parsed text directly):

```json
{ "totalPages": 12, "pages": [ {"page": 1, "text": "..."} ], "text": "..." }
```

## metadata

Read document metadata, or set any of `title`, `author`, `subject`, `keywords`, `creator`, `producer`. Read by omitting `set`; write by passing it. Flat per-call price. A read returns `{ "metadata": { ... } }`; a write returns the standard output envelope with the updated PDF.

```json
POST /v1/pdf/metadata
{ "file": {"inline":"<b64>"}, "set": {"title":"Q3 Report","author":"finance-bot"} }
```

## encrypt

Password-protect a PDF (qpdf, 256-bit AES). Supply `userPassword` (the open password), `ownerPassword` (permissions password), or both — at least one. Neither may begin with `-`. Supplying only an `ownerPassword` leaves the open password empty: anyone can open the document, but permissions stay owner-gated (the standard PDF model).

Eight optional **permission booleans** — the canonical PDF permission set — restrict what an opened document allows: `print`, `printHighRes`, `copy` (text/graphics extraction), `modify`, `annotate`, `fillForms` (fill in form fields), `extract` (extract for accessibility), and `assemble` (insert/delete/rotate pages). An **absent** boolean means the qpdf default (allowed); pass `false` to restrict. Permissions are enforced by PDF viewers against the owner password.

Two notes on the set: `print` and `printHighRes` fold into one print level — `print:false` blocks printing entirely, `print:true` with `printHighRes:false` allows only low-resolution printing, and `print:true` alone allows full-quality printing. And `copy` is the live text/graphics extraction control; `extract` is the separate *accessibility*-extraction bit, which PDF 2.0 (the 256-bit encryption used here) deprecated and always grants — so `extract:false` may be a no-op on modern viewers. Use `copy:false` to actually block extraction.

```json
POST /v1/pdf/encrypt
{ "file": {"inline":"<b64>"}, "userPassword": "hunter2", "ownerPassword": "admin-only",
  "print": true, "printHighRes": false, "copy": false, "modify": false,
  "annotate": false, "fillForms": true, "assemble": false }
```

Passwords ride only in the request body to the IAM-gated worker — they're never logged.

## decrypt

Remove a password from a PDF (qpdf). The supplied `password` must open the document.

```json
POST /v1/pdf/decrypt
{ "file": {"inline":"<b64>"}, "password": "hunter2" }
```

## compress

Recompress a PDF's object and content streams (qpdf). Billed **per page, not by savings** — the compression ratio depends entirely on the document (an already-optimized PDF may not shrink). Set `linearize: true` for web "fast view" (byte-range streaming).

```json
POST /v1/pdf/compress
{ "file": {"inline":"<b64>"}, "linearize": true }
```

Looking for a "linearize" / web-optimize op? It's this flag — `linearize: true` on compress IS the linearization path (there is no separate `pdf/linearize` op).

## repair

Repair a damaged PDF (qpdf). Two steps in one call: `qpdf --check` **diagnoses** the document (a findings report — broken xref, stream-length mismatches, structural warnings), then a full **rewrite** re-serializes it, reconstructing the cross-reference table and normalizing structure. Billed per page (compress's rate). Returns the repaired PDF **always as a storage ref** plus the diagnosis:

```json
POST /v1/pdf/repair
{ "file": {"inline":"<b64>"} }
```

```json
{ "output": { "outputKey": "...", "outputUrl": "https://...", "sizeBytes": 51234, "contentType": "application/pdf", "filename": "repaired.pdf" },
  "repair": { "exitCode": 2, "findings": ["xref not found", "Attempting to reconstruct cross-reference table", "..."] } }
```

`repair.exitCode` is qpdf's `--check` verdict: `0` clean, `2` errors found, `3` warnings — all three still produce a rewritten output. A PDF too damaged to parse at all returns `422 PDF_PARSE_FAILED` (charge reversed).

## render

Rasterize PDF pages to PNG or JPEG (poppler `pdftoppm`). `pages` is an optional 1-based range (default: all); `dpi` ≤ 300 (default 150); `format` is `png` or `jpeg`. Non-embedded standard-14 fonts (Helvetica, Times, …) render correctly via a vendored DejaVu Sans substitution. **Capped at 40 pages and 300 DPI** — over either returns `422` pre-charge. Billed per **output** page.

```json
POST /v1/pdf/render
{ "file": {"inline":"<b64>"}, "pages": "1-5", "dpi": 200, "format": "png" }
```

Render produces multiple images, so it returns a **manifest** (like split), one presigned-GET image ref per page:

```json
{ "images": [ {"index": 0, "page": 1, "outputKey": "...", "outputUrl": "https://..."} ] }
```

## ocr

Extract text from a scanned/image PDF (poppler `pdftoppm` → Tesseract LSTM). `pages` is an optional 1-based range (default: all). **Synchronous and capped at 5 pages** — the call must clear a ~30-second window, so larger asynchronous OCR jobs are coming. Over the cap returns `422 OCR_TOO_MANY_PAGES_SYNC` (which advertises the cap) before any charge. Billed per **source** page.

`lang` selects the OCR language from the **deployed 10-language roster**: `eng` (default), `spa`, `fra`, `deu`, `ita`, `por`, `nld`, `pol`, `rus`, `chi_sim`. The live roster is the operator-tunable `cputools.ocr.langs`; an off-roster `lang` returns a **free** `422 UNSUPPORTED_LANG` that names the roster. The same `lang` parameter (and roster) applies to `ocr-searchable` and [`image/ocr`](/docs/image-tools).

```json
POST /v1/pdf/ocr
{ "file": {"inline":"<b64>"}, "pages": "1-3", "lang": "spa" }
```

Response (parsed text, not an output envelope):

```json
{ "pages": [ {"page": 1, "text": "..."} ], "text": "..." }
```

## ocr-searchable

Turn a scanned PDF into a **searchable PDF**: per page, the worker rasterizes (`pdftoppm`), Tesseract emits a page PDF carrying the original page image plus an **invisible text layer**, and qpdf merges the pages. The result looks identical but is selectable, copyable, and indexable. Same `pages` / `lang` parameters and the same 5-page synchronous cap as `ocr`; billed per **source** page at a premium over plain OCR (it runs OCR *plus* PDF assembly).

```json
POST /v1/pdf/ocr-searchable
{ "file": {"inline":"<b64>"}, "pages": "1-3", "lang": "eng" }
```

The output is **always a storage ref** — `{ output: { outputKey, outputUrl, sizeBytes, contentType, filename } }`, never inline (the render convention; OCR'd page images are large). Fetch the PDF from the presigned `outputUrl`, or chain `outputKey` straight into another op.

## from-html

**Generate a PDF from HTML** — render caller-supplied HTML/CSS to a PDF on a headless Chromium worker. This is how you produce invoices, reports, contracts, and certificates from a template + data: build the HTML, get back a print-ready PDF. Flat per-call price.

```json
POST /v1/pdf/from-html
{ "html": "<h1>Invoice #1042</h1>…",
  "options": { "format": "A4", "margin": {"top":"2cm","bottom":"2cm"}, "printBackground": true,
    "headerTemplate": "<div style='font-size:8px;width:100%;text-align:right'>Invoice #1042</div>",
    "footerTemplate": "<div style='font-size:8px;width:100%;text-align:center'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>",
    "pageRanges": "1-3" } }
```

`options` (all optional): `format` (`A4` | `Letter` | `Legal` | `A3` | `A5` | `Tabloid` | `Ledger`), `landscape` (boolean), `margin` (`top` / `right` / `bottom` / `left`, each a size string in `px`, `cm`, `mm`, or `in`), `printBackground` (boolean), `scale` (0.1–2). Plus print furniture and output controls: `headerTemplate` / `footerTemplate` (HTML for the running header/footer — use the `date`, `title`, `url`, `pageNumber`, and `totalPages` classes to inject values; leave room with `margin`), `displayHeaderFooter` (boolean — auto-enabled when you supply a template, so you rarely set it yourself; set `false` to suppress), `pageRanges` (a subset to print, e.g. `"1-5, 8"`; default all pages), and `tagged` (emit a tagged/accessible PDF — **default `true`**). The HTML itself is capped at 5 MB (`cputools.render.max_html_bytes`, operator-tunable; over → free `422 HTML_TOO_LARGE`). Returns the standard output envelope (a PDF).

**Sandboxed by design.** The renderer is locked **read-only with no network and no JavaScript**: every external resource request — `<img>`, `<link>`, `<script>`, `fetch`, `@import`, `file://` — is blocked, so the renderer can't reach the network (no SSRF) or execute scripts. **The header/footer templates render under the same wall** — they cannot fetch or execute anything either. That means **assets must be inlined**: `data:` URIs for images, inline `<style>` for CSS. For a chart, render it server-side and embed the PNG as a `data:` URI. Static, templated HTML is the sweet spot.

## from-office

**Convert an Office document to PDF** — docx, xlsx, pptx, odt, ods, odp, rtf, txt, or csv in; a print-ready PDF out (LibreOffice via Gotenberg). Flat per-call price. The `filename` is required — its extension is how the format is recognized.

```json
POST /v1/pdf/from-office
{ "file": {"inline":"<base64 docx>"}, "filename": "q3-report.docx" }
```

Takes the standard input source (`{ "inline": "<b64>" }` ≤ 4 MB, or `{ "inputKey": "..." }` for larger files — capped at 30 MB for this op) and returns the standard output envelope (a PDF). An unsupported extension returns `422 UNSUPPORTED_FORMAT` and an oversized input `422 INPUT_TOO_LARGE` — both **before any charge**. If the conversion itself can't be delivered (a corrupt file, an upstream failure), the call returns `422 CONVERT_FAILED` and the charge is reversed: you pay only for a delivered PDF.

## images

**Extract the embedded images from a PDF** — pull every raster image XObject out of the document and return each as a PNG. Billed per page scanned; `pages` is an optional 1-based range string (default: all). Useful for harvesting figures, scans, or logos a PDF carries.

```json
POST /v1/pdf/images
{ "file": { "inline": "<base64-pdf>" }, "pages": "1-5" }
```

Returns a manifest: one entry per extracted image, each `output` a standard output envelope (PNG):

```json
{ "images": [ {"page": 1, "index": 0, "width": 800, "height": 600, "output": { ... }} ], "count": 1, "truncated": false }
```

Capped at **200 images** per call — extraction stops there and the response sets `truncated: true`.

## diff

**Compare two PDFs by their text** — extract the text of both and return a unified diff. Per-page price, billed on the combined page count of both inputs. Pure-text compare (layout/visual diff is not in scope). `context` (optional, 0–100, default 3) sets the unified-diff context lines.

```json
POST /v1/pdf/diff
{ "a": { "inline": "<base64-pdf>" }, "b": { "inputKey": "uploads/v2.pdf" }, "context": 3 }
```

Returns `{ patch, changed }` — a unified-diff string (`createTwoFilesPatch` format) plus a boolean; `changed: false` when the two carry identical text.

## bookmarks

**Read the outline / bookmark tree** of a PDF. Flat per-call price. Read-only (writing bookmarks is not in scope yet).

```json
POST /v1/pdf/bookmarks
{ "file": { "inline": "<base64-pdf>" } }
```

Returns `{ outline, count }` — a recursive tree of `{ title, children }` plus the total node count. Empty array when the PDF has no outline.

## attachments

**List or extract embedded file attachments** of a PDF. Flat per-call price.

```json
POST /v1/pdf/attachments
{ "file": { "inline": "<base64-pdf>" }, "extract": true }
```

Returns `{ attachments, count }` — one entry per embedded file (`{ filename, size }`, first 100); with `extract: true`, each entry also carries an `output` envelope with the file bytes. Empty array when the PDF carries no attachments.

## verify-signatures

Inspect a PDF's **digital signatures** (poppler `pdfsig`). Flat per-call price.

```json
POST /v1/pdf/verify-signatures
{ "file": {"inline":"<b64>"} }
```

```json
{ "signatures": [ {
    "index": 1,
    "signerCommonName": "Jane Doe",
    "signerDistinguishedName": "CN=Jane Doe,O=Acme",
    "signingTime": "Jun 11 2026 14:02:11",
    "hashAlgorithm": "SHA-256",
    "signatureType": "ETSI.CAdES.detached",
    "signatureValidation": "Signature is Valid.",
    "certificateValidation": "Certificate issuer isn't Trusted."
  } ],
  "signatureCount": 1 }
```

**Honest semantics — read this before relying on it.** This op performs **structural verification + digest intactness**: the signature is present, the signed byte range hasn't been modified since signing (`signatureValidation`), and the signer's identity fields are surfaced. **Certificate-chain trust is not evaluated — no CA store is deployed**, so `certificateValidation` will typically report that the issuer chain couldn't be checked. It answers "is this document signed, by whom, and is it unmodified?" — not "do I trust the signer's certificate authority?". An **unsigned** document returns `"signatures": []` with `signatureCount: 0` — a normal outcome, not an error.

This op pairs with [Relaystation e-sign](https://relaystation.ai/docs/esigndoc) as the **verification half**: send documents for legally-binding signature there, verify what came back (or any third-party-signed PDF) here.

## Large files and the retry tail

For a file above the 4 MB inline ceiling, mint a presigned upload (see above), `POST` the bytes, and pass `{"inputKey":"..."}` to the worker op. The worker ops are synchronous: the API holds the connection while qpdf/poppler/Tesseract run. A worst-case job near the window (a 40-page render, a 5-page OCR) can occasionally exceed the API Gateway's ~30-second integration timeout and return a `504`. This is safe: **re-issue the same request with the same `Idempotency-Key`** and the chassis returns the already-computed result without charging twice.

## Errors

- `402 PAYMENT_REQUIRED` — no valid payment (see [x402](https://relaystation.ai/docs/x402)).
- `422 PDF_PARSE_FAILED` — the input didn't parse as a PDF.
- `422 WRONG_PASSWORD` / `PDF_ENCRYPTED` — decrypt with the wrong password, or render/OCR an encrypted PDF (decrypt it first).
- `422 RENDER_TOO_MANY_PAGES` / `DPI_TOO_HIGH` / `OCR_TOO_MANY_PAGES_SYNC` — over a render/OCR cap (charged nothing).
- `422 UNSUPPORTED_LANG` — an OCR `lang` outside the deployed roster (`cputools.ocr.langs`); the error names the roster (charged nothing).
- `422 BAD_RANGE` — a malformed or out-of-bounds page range.
- `422 UNSUPPORTED_FORMAT` / `INPUT_TOO_LARGE` / `CONVERT_FAILED` — from-office: extension not in the allowlist or input over the size cap (charged nothing), or the conversion couldn't be delivered (charge reversed).
- `400 VALIDATION_ERROR` — the request body failed schema validation.

## Next

[Overview](/docs/cputools-overview) · [Pricing](/pricing) · [API reference](/api-reference) · [x402 wire format](https://relaystation.ai/docs/x402) · [Authentication](https://relaystation.ai/docs/authentication)
