cputools logo cputools

PDF tools

Twenty-one PDF operations, one endpoint, one uniform request/response shape. Every op is a POST https://api.relaystation.ai/v1/pdf/<op>. The pure-PDF ops (merge … metadata, images, diff, bookmarks, attachments) run in-process on pdf-lib / pdf.js; the heavier ops run on dedicated engines — encrypt, compress, repair, render, OCR, ocr-searchable, and verify-signatures on vendored qpdf, poppler (including pdfsig), and Tesseract; from-html (HTML→PDF) on a vendored headless Chromium; and from-office (Office→PDF) on LibreOffice (Gotenberg) — same request shape, same per-call billing. from-html generates a PDF from scratch (an invoice, a report, a certificate); from-office converts an existing office document (docx, xlsx, pptx, …) into one.

Inputs and outputs — the uniform shape

Every op takes its file (or files) as an input source and returns a uniform output envelope. You never stream a raw file; the body is always JSON.

Input source — one of two shapes per file field:

{ "inline": "<base64-encoded PDF>" }          // for files ≤ 4 MB
{ "inputKey": "<scratch object key>" }         // for larger files

For files above the 4 MB inline ceiling, mint a one-time presigned upload first:

curl -X POST https://api.relaystation.ai/v1/cputools/upload-url \
  -H 'Authorization: Bearer rs_live_<key>' \
  -H 'Content-Type: application/json' \
  -d '{"ext":"pdf","contentType":"application/pdf"}'
# → { "url": "...", "fields": { ... }, "inputKey": "...", "expiresAt": "...", "maxBytes": 52428800 }

Then upload with a multipart/form-data POST to url — include every entry from fields as a form field, then a file field carrying the bytes (the standard S3 presigned-POST form). Finally pass {"inputKey":"..."} as the file to the op route. The presigned POST is customer-scoped, short-lived, and size-capped at maxBytes (S3 rejects an oversized body at upload time). An x402 lodestone wallet can mint one too — minting is free; only the op that consumes the bytes bills.

Output envelope — always JSON, under an output key:

{
  "output": {
    "inline": "<base64>",            // present when the result ≤ 4 MB
    "outputKey": "...",              // present when the result > 4 MB
    "outputUrl": "https://...",      // presigned GET for the scratch key
    "sizeBytes": 12345,
    "contentType": "application/pdf",
    "filename": "merged.pdf"
  }
}

The inline-vs-reference threshold is operator-tunable (cputools.io.max_inline_bytes, default 4 MB). Results at or under it come back inline; larger results are written to a scratch key and returned as a presigned outputUrl. The full inline / scratch / baton model — and recipes for downloading or chaining outputs — is in Passing & receiving files.

Billing

Per-page ops bill the page count × the per-page rate, minimum one page. metadata is a flat per-call charge. The per-op rates are operator-tunable (cputools.price.pdf.<op>.per_page_micros); the launch defaults:

OpBilled onRate
mergeoutput pages$0.001 / page
splitsource pages$0.001 / page
rotateresult pages$0.001 / page
pagesresult pages$0.001 / page
watermarkpages stamped$0.001 / page
formpages$0.001 / page
extract-textsource pages$0.002 / page
metadataper call$0.001 flat
encryptpages$0.001 / page
compresspages$0.001 / page
repairpages$0.001 / page
renderoutput pages$0.001 / page
ocrsource pages$0.003 / page
ocr-searchablesource pages$0.004 / page
verify-signaturesper call$0.001 flat
from-htmlper call$0.003 flat
from-officeper call$0.005 flat
imagespages scanned$0.001 / page
diffcombined pages of both inputs$0.001 / page
bookmarksper call$0.001 flat
attachmentsper call$0.001 flat

These are launch defaults; the live 402 challenge is authoritative (an unauthenticated POST returns the exact price). Every billable call needs an Idempotency-Key header — omit it and the chassis generates one per call.

merge

Combine 2–50 PDFs into one, in the order given.

POST /v1/pdf/merge
{ "files": [ {"inline":"<b64>"}, {"inputKey":"..."} ], "filename": "combined.pdf" }

split

Split one PDF by 1-based ranges ("1-3,5,8-10") into separate documents, or burst: true to explode every page into its own single-page PDF. Omit ranges (or pass "all") for the whole document.

POST /v1/pdf/split
{ "file": {"inline":"<b64>"}, "ranges": "1-3,7-9" }

Split produces multiple files, so it returns a manifest (not the single output envelope): each entry is a storage reference you can re-submit as an inputKey to another op.

{ "files": [ {"index": 0, "outputKey": "...", "outputUrl": "https://...", "pages": 3, "sizeBytes": 12345} ] }

rotate

Rotate pages by 90 / 180 / 270 degrees. pages is an optional 1-based range string; omit it to rotate every page.

POST /v1/pdf/rotate
{ "file": {"inline":"<b64>"}, "degrees": "90", "pages": "1,3-5" }

pages

Edit page structure. One of three actions:

  • delete — drop a range of pages: {"action":"delete","file":{...},"pages":"2,4-6"}
  • reorder — supply the new 1-based page order: {"action":"reorder","file":{...},"order":[3,1,2]}
  • insert — splice another PDF in at a 1-based position (at: 1 inserts before the first page; at: N+1 appends): {"action":"insert","file":{...},"insert":{...},"at":2}

watermark

Stamp text on every page (or a pages range). Tune opacity (0–1), size (pt, ≤ 400), color (#RRGGBB), and position (center | top-left | top-right | bottom-left | bottom-right).

POST /v1/pdf/watermark
{ "file": {"inline":"<b64>"}, "text": "CONFIDENTIAL", "opacity": 0.2, "position": "center" }

form

Fill or flatten an AcroForm. Pass fields as a map of field name → string or boolean (checkbox); set flatten: true to bake the values in so the form is no longer editable.

POST /v1/pdf/form
{ "file": {"inline":"<b64>"}, "fields": {"name":"Ada Lovelace","subscribe":true}, "flatten": true }

extract-text

Pull text + structure out of a PDF (engine: pdf.js / unpdf). Returns per-page text plus the concatenated whole. pages is an optional 1-based range string.

POST /v1/pdf/extract-text
{ "file": {"inline":"<b64>"}, "pages": "1-5" }

Response (not an output envelope — this op returns parsed text directly):

{ "totalPages": 12, "pages": [ {"page": 1, "text": "..."} ], "text": "..." }

metadata

Read document metadata, or set any of title, author, subject, keywords, creator, producer. Read by omitting set; write by passing it. Flat per-call price. A read returns { "metadata": { ... } }; a write returns the standard output envelope with the updated PDF.

POST /v1/pdf/metadata
{ "file": {"inline":"<b64>"}, "set": {"title":"Q3 Report","author":"finance-bot"} }

encrypt

Password-protect a PDF (qpdf, 256-bit AES). Supply userPassword (the open password), ownerPassword (permissions password), or both — at least one. Neither may begin with -. Supplying only an ownerPassword leaves the open password empty: anyone can open the document, but permissions stay owner-gated (the standard PDF model).

Eight optional permission booleans — the canonical PDF permission set — restrict what an opened document allows: print, printHighRes, copy (text/graphics extraction), modify, annotate, fillForms (fill in form fields), extract (extract for accessibility), and assemble (insert/delete/rotate pages). An absent boolean means the qpdf default (allowed); pass false to restrict. Permissions are enforced by PDF viewers against the owner password.

Two notes on the set: print and printHighRes fold into one print level — print:false blocks printing entirely, print:true with printHighRes:false allows only low-resolution printing, and print:true alone allows full-quality printing. And copy is the live text/graphics extraction control; extract is the separate accessibility-extraction bit, which PDF 2.0 (the 256-bit encryption used here) deprecated and always grants — so extract:false may be a no-op on modern viewers. Use copy:false to actually block extraction.

POST /v1/pdf/encrypt
{ "file": {"inline":"<b64>"}, "userPassword": "hunter2", "ownerPassword": "admin-only",
  "print": true, "printHighRes": false, "copy": false, "modify": false,
  "annotate": false, "fillForms": true, "assemble": false }

Passwords ride only in the request body to the IAM-gated worker — they’re never logged.

compress

Recompress a PDF’s object and content streams (qpdf). Billed per page, not by savings — the compression ratio depends entirely on the document (an already-optimized PDF may not shrink). Set linearize: true for web “fast view” (byte-range streaming).

POST /v1/pdf/compress
{ "file": {"inline":"<b64>"}, "linearize": true }

Looking for a “linearize” / web-optimize op? It’s this flag — linearize: true on compress IS the linearization path (there is no separate pdf/linearize op).

repair

Repair a damaged PDF (qpdf). Two steps in one call: qpdf --check diagnoses the document (a findings report — broken xref, stream-length mismatches, structural warnings), then a full rewrite re-serializes it, reconstructing the cross-reference table and normalizing structure. Billed per page (compress’s rate). Returns the repaired PDF always as a storage ref plus the diagnosis:

POST /v1/pdf/repair
{ "file": {"inline":"<b64>"} }
{ "output": { "outputKey": "...", "outputUrl": "https://...", "sizeBytes": 51234, "contentType": "application/pdf", "filename": "repaired.pdf" },
  "repair": { "exitCode": 2, "findings": ["xref not found", "Attempting to reconstruct cross-reference table", "..."] } }

repair.exitCode is qpdf’s --check verdict: 0 clean, 2 errors found, 3 warnings — all three still produce a rewritten output. A PDF too damaged to parse at all returns 422 PDF_PARSE_FAILED (charge reversed).

render

Rasterize PDF pages to PNG or JPEG (poppler pdftoppm). pages is an optional 1-based range (default: all); dpi ≤ 300 (default 150); format is png or jpeg. Non-embedded standard-14 fonts (Helvetica, Times, …) render correctly via a vendored DejaVu Sans substitution. Capped at 40 pages and 300 DPI — over either returns 422 pre-charge. Billed per output page.

POST /v1/pdf/render
{ "file": {"inline":"<b64>"}, "pages": "1-5", "dpi": 200, "format": "png" }

Render produces multiple images, so it returns a manifest (like split), one presigned-GET image ref per page:

{ "images": [ {"index": 0, "page": 1, "outputKey": "...", "outputUrl": "https://..."} ] }

ocr

Extract text from a scanned/image PDF (poppler pdftoppm → Tesseract LSTM). pages is an optional 1-based range (default: all). Synchronous and capped at 5 pages — the call must clear a ~30-second window, so larger asynchronous OCR jobs are coming. Over the cap returns 422 OCR_TOO_MANY_PAGES_SYNC (which advertises the cap) before any charge. Billed per source page.

lang selects the OCR language from the deployed 10-language roster: eng (default), spa, fra, deu, ita, por, nld, pol, rus, chi_sim. The live roster is the operator-tunable cputools.ocr.langs; an off-roster lang returns a free 422 UNSUPPORTED_LANG that names the roster. The same lang parameter (and roster) applies to ocr-searchable and image/ocr.

POST /v1/pdf/ocr
{ "file": {"inline":"<b64>"}, "pages": "1-3", "lang": "spa" }

Response (parsed text, not an output envelope):

{ "pages": [ {"page": 1, "text": "..."} ], "text": "..." }

ocr-searchable

Turn a scanned PDF into a searchable PDF: per page, the worker rasterizes (pdftoppm), Tesseract emits a page PDF carrying the original page image plus an invisible text layer, and qpdf merges the pages. The result looks identical but is selectable, copyable, and indexable. Same pages / lang parameters and the same 5-page synchronous cap as ocr; billed per source page at a premium over plain OCR (it runs OCR plus PDF assembly).

POST /v1/pdf/ocr-searchable
{ "file": {"inline":"<b64>"}, "pages": "1-3", "lang": "eng" }

The output is always a storage ref{ output: { outputKey, outputUrl, sizeBytes, contentType, filename } }, never inline (the render convention; OCR’d page images are large). Fetch the PDF from the presigned outputUrl, or chain outputKey straight into another op.

from-html

Generate a PDF from HTML — render caller-supplied HTML/CSS to a PDF on a headless Chromium worker. This is how you produce invoices, reports, contracts, and certificates from a template + data: build the HTML, get back a print-ready PDF. Flat per-call price.

POST /v1/pdf/from-html
{ "html": "<h1>Invoice #1042</h1>…",
  "options": { "format": "A4", "margin": {"top":"2cm","bottom":"2cm"}, "printBackground": true,
    "headerTemplate": "<div style='font-size:8px;width:100%;text-align:right'>Invoice #1042</div>",
    "footerTemplate": "<div style='font-size:8px;width:100%;text-align:center'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>",
    "pageRanges": "1-3" } }

options (all optional): format (A4 | Letter | Legal | A3 | A5 | Tabloid | Ledger), landscape (boolean), margin (top / right / bottom / left, each a size string in px, cm, mm, or in), printBackground (boolean), scale (0.1–2). Plus print furniture and output controls: headerTemplate / footerTemplate (HTML for the running header/footer — use the date, title, url, pageNumber, and totalPages classes to inject values; leave room with margin), displayHeaderFooter (boolean — auto-enabled when you supply a template, so you rarely set it yourself; set false to suppress), pageRanges (a subset to print, e.g. "1-5, 8"; default all pages), and tagged (emit a tagged/accessible PDF — default true). The HTML itself is capped at 5 MB (cputools.render.max_html_bytes, operator-tunable; over → free 422 HTML_TOO_LARGE). Returns the standard output envelope (a PDF).

Sandboxed by design. The renderer is locked read-only with no network and no JavaScript: every external resource request — <img>, <link>, <script>, fetch, @import, file:// — is blocked, so the renderer can’t reach the network (no SSRF) or execute scripts. The header/footer templates render under the same wall — they cannot fetch or execute anything either. That means assets must be inlined: data: URIs for images, inline <style> for CSS. For a chart, render it server-side and embed the PNG as a data: URI. Static, templated HTML is the sweet spot.

from-office

Convert an Office document to PDF — docx, xlsx, pptx, odt, ods, odp, rtf, txt, or csv in; a print-ready PDF out (LibreOffice via Gotenberg). Flat per-call price. The filename is required — its extension is how the format is recognized.

POST /v1/pdf/from-office
{ "file": {"inline":"<base64 docx>"}, "filename": "q3-report.docx" }

Takes the standard input source ({ "inline": "<b64>" } ≤ 4 MB, or { "inputKey": "..." } for larger files — capped at 30 MB for this op) and returns the standard output envelope (a PDF). An unsupported extension returns 422 UNSUPPORTED_FORMAT and an oversized input 422 INPUT_TOO_LARGE — both before any charge. If the conversion itself can’t be delivered (a corrupt file, an upstream failure), the call returns 422 CONVERT_FAILED and the charge is reversed: you pay only for a delivered PDF.

images

Extract the embedded images from a PDF — pull every raster image XObject out of the document and return each as a PNG. Billed per page scanned; pages is an optional 1-based range string (default: all). Useful for harvesting figures, scans, or logos a PDF carries.

POST /v1/pdf/images
{ "file": { "inline": "<base64-pdf>" }, "pages": "1-5" }

Returns a manifest: one entry per extracted image, each output a standard output envelope (PNG):

{ "images": [ {"page": 1, "index": 0, "width": 800, "height": 600, "output": { ... }} ], "count": 1, "truncated": false }

Capped at 200 images per call — extraction stops there and the response sets truncated: true.

diff

Compare two PDFs by their text — extract the text of both and return a unified diff. Per-page price, billed on the combined page count of both inputs. Pure-text compare (layout/visual diff is not in scope). context (optional, 0–100, default 3) sets the unified-diff context lines.

POST /v1/pdf/diff
{ "a": { "inline": "<base64-pdf>" }, "b": { "inputKey": "uploads/v2.pdf" }, "context": 3 }

Returns { patch, changed } — a unified-diff string (createTwoFilesPatch format) plus a boolean; changed: false when the two carry identical text.

bookmarks

Read the outline / bookmark tree of a PDF. Flat per-call price. Read-only (writing bookmarks is not in scope yet).

POST /v1/pdf/bookmarks
{ "file": { "inline": "<base64-pdf>" } }

Returns { outline, count } — a recursive tree of { title, children } plus the total node count. Empty array when the PDF has no outline.

attachments

List or extract embedded file attachments of a PDF. Flat per-call price.

POST /v1/pdf/attachments
{ "file": { "inline": "<base64-pdf>" }, "extract": true }

Returns { attachments, count } — one entry per embedded file ({ filename, size }, first 100); with extract: true, each entry also carries an output envelope with the file bytes. Empty array when the PDF carries no attachments.

verify-signatures

Inspect a PDF’s digital signatures (poppler pdfsig). Flat per-call price.

POST /v1/pdf/verify-signatures
{ "file": {"inline":"<b64>"} }
{ "signatures": [ {
    "index": 1,
    "signerCommonName": "Jane Doe",
    "signerDistinguishedName": "CN=Jane Doe,O=Acme",
    "signingTime": "Jun 11 2026 14:02:11",
    "hashAlgorithm": "SHA-256",
    "signatureType": "ETSI.CAdES.detached",
    "signatureValidation": "Signature is Valid.",
    "certificateValidation": "Certificate issuer isn't Trusted."
  } ],
  "signatureCount": 1 }

What it checks — read this before relying on it. This op performs structural verification + digest intactness: the signature is present, the signed byte range hasn’t been modified since signing (signatureValidation), and the signer’s identity fields are surfaced. Certificate-chain trust is not evaluated — no CA store is deployed, so certificateValidation will typically report that the issuer chain couldn’t be checked. It answers “is this document signed, by whom, and is it unmodified?” — not “do I trust the signer’s certificate authority?”. An unsigned document returns "signatures": [] with signatureCount: 0 — a normal outcome, not an error.

This op pairs with Relaystation e-sign as the verification half: send documents for legally-binding signature there, verify what came back (or any third-party-signed PDF) here.

Large files and the retry tail

For a file above the 4 MB inline ceiling, mint a presigned upload (see above), POST the bytes, and pass {"inputKey":"..."} to the worker op. The worker ops are synchronous: the API holds the connection while qpdf/poppler/Tesseract run. A worst-case job near the window (a 40-page render, a 5-page OCR) can occasionally exceed the API Gateway’s ~30-second integration timeout and return a 504. This is safe: re-issue the same request with the same Idempotency-Key and the chassis returns the already-computed result without charging twice.

Errors

  • 402 PAYMENT_REQUIRED — no valid payment (see x402).
  • 422 PDF_PARSE_FAILED — the input didn’t parse as a PDF.
  • 422 WRONG_PASSWORD / PDF_ENCRYPTED — render/OCR an encrypted PDF (remove its password with a desktop tool first; in-platform decrypt is unavailable).
  • 422 RENDER_TOO_MANY_PAGES / DPI_TOO_HIGH / OCR_TOO_MANY_PAGES_SYNC — over a render/OCR cap (charged nothing).
  • 422 UNSUPPORTED_LANG — an OCR lang outside the deployed roster (cputools.ocr.langs); the error names the roster (charged nothing).
  • 422 BAD_RANGE — a malformed or out-of-bounds page range.
  • 422 UNSUPPORTED_FORMAT / INPUT_TOO_LARGE / CONVERT_FAILED — from-office: extension not in the allowlist or input over the size cap (charged nothing), or the conversion couldn’t be delivered (charge reversed).
  • 400 VALIDATION_ERROR — the request body failed schema validation.

Next

Overview · Pricing · API reference · x402 wire format · Authentication