PDF tools
Twenty-one PDF operations, one endpoint, one uniform request/response shape. Every op is a POST https://api.relaystation.ai/v1/pdf/<op>. The pure-PDF ops (merge … metadata, images, diff, bookmarks, attachments) run in-process on pdf-lib / pdf.js; the heavier ops run on dedicated engines — encrypt, compress, repair, render, OCR, ocr-searchable, and verify-signatures on vendored qpdf, poppler (including pdfsig), and Tesseract; from-html (HTML→PDF) on a vendored headless Chromium; and from-office (Office→PDF) on LibreOffice (Gotenberg) — same request shape, same per-call billing. from-html generates a PDF from scratch (an invoice, a report, a certificate); from-office converts an existing office document (docx, xlsx, pptx, …) into one.
Inputs and outputs — the uniform shape
Every op takes its file (or files) as an input source and returns a uniform output envelope. You never stream a raw file; the body is always JSON.
Input source — one of two shapes per file field:
{ "inline": "<base64-encoded PDF>" } // for files ≤ 4 MB
{ "inputKey": "<scratch object key>" } // for larger files
For files above the 4 MB inline ceiling, mint a one-time presigned upload first:
curl -X POST https://api.relaystation.ai/v1/cputools/upload-url \
-H 'Authorization: Bearer rs_live_<key>' \
-H 'Content-Type: application/json' \
-d '{"ext":"pdf","contentType":"application/pdf"}'
# → { "url": "...", "fields": { ... }, "inputKey": "...", "expiresAt": "...", "maxBytes": 52428800 }
Then upload with a multipart/form-data POST to url — include every entry from fields as a form field, then a file field carrying the bytes (the standard S3 presigned-POST form). Finally pass {"inputKey":"..."} as the file to the op route. The presigned POST is customer-scoped, short-lived, and size-capped at maxBytes (S3 rejects an oversized body at upload time). An x402 lodestone wallet can mint one too — minting is free; only the op that consumes the bytes bills.
Output envelope — always JSON, under an output key:
{
"output": {
"inline": "<base64>", // present when the result ≤ 4 MB
"outputKey": "...", // present when the result > 4 MB
"outputUrl": "https://...", // presigned GET for the scratch key
"sizeBytes": 12345,
"contentType": "application/pdf",
"filename": "merged.pdf"
}
}
The inline-vs-reference threshold is operator-tunable (cputools.io.max_inline_bytes, default 4 MB). Results at or under it come back inline; larger results are written to a scratch key and returned as a presigned outputUrl. The full inline / scratch / baton model — and recipes for downloading or chaining outputs — is in Passing & receiving files.
Billing
Per-page ops bill the page count × the per-page rate, minimum one page. metadata is a flat per-call charge. The per-op rates are operator-tunable (cputools.price.pdf.<op>.per_page_micros); the launch defaults:
| Op | Billed on | Rate |
|---|---|---|
merge | output pages | $0.001 / page |
split | source pages | $0.001 / page |
rotate | result pages | $0.001 / page |
pages | result pages | $0.001 / page |
watermark | pages stamped | $0.001 / page |
form | pages | $0.001 / page |
extract-text | source pages | $0.002 / page |
metadata | per call | $0.001 flat |
encrypt | pages | $0.001 / page |
compress | pages | $0.001 / page |
repair | pages | $0.001 / page |
render | output pages | $0.001 / page |
ocr | source pages | $0.003 / page |
ocr-searchable | source pages | $0.004 / page |
verify-signatures | per call | $0.001 flat |
from-html | per call | $0.003 flat |
from-office | per call | $0.005 flat |
images | pages scanned | $0.001 / page |
diff | combined pages of both inputs | $0.001 / page |
bookmarks | per call | $0.001 flat |
attachments | per call | $0.001 flat |
These are launch defaults; the live 402 challenge is authoritative (an unauthenticated POST returns the exact price). Every billable call needs an Idempotency-Key header — omit it and the chassis generates one per call.
merge
Combine 2–50 PDFs into one, in the order given.
POST /v1/pdf/merge
{ "files": [ {"inline":"<b64>"}, {"inputKey":"..."} ], "filename": "combined.pdf" }
split
Split one PDF by 1-based ranges ("1-3,5,8-10") into separate documents, or burst: true to explode every page into its own single-page PDF. Omit ranges (or pass "all") for the whole document.
POST /v1/pdf/split
{ "file": {"inline":"<b64>"}, "ranges": "1-3,7-9" }
Split produces multiple files, so it returns a manifest (not the single output envelope): each entry is a storage reference you can re-submit as an inputKey to another op.
{ "files": [ {"index": 0, "outputKey": "...", "outputUrl": "https://...", "pages": 3, "sizeBytes": 12345} ] }
rotate
Rotate pages by 90 / 180 / 270 degrees. pages is an optional 1-based range string; omit it to rotate every page.
POST /v1/pdf/rotate
{ "file": {"inline":"<b64>"}, "degrees": "90", "pages": "1,3-5" }
pages
Edit page structure. One of three actions:
delete— drop a range of pages:{"action":"delete","file":{...},"pages":"2,4-6"}reorder— supply the new 1-based page order:{"action":"reorder","file":{...},"order":[3,1,2]}insert— splice another PDF in at a 1-based position (at: 1inserts before the first page;at: N+1appends):{"action":"insert","file":{...},"insert":{...},"at":2}
watermark
Stamp text on every page (or a pages range). Tune opacity (0–1), size (pt, ≤ 400), color (#RRGGBB), and position (center | top-left | top-right | bottom-left | bottom-right).
POST /v1/pdf/watermark
{ "file": {"inline":"<b64>"}, "text": "CONFIDENTIAL", "opacity": 0.2, "position": "center" }
form
Fill or flatten an AcroForm. Pass fields as a map of field name → string or boolean (checkbox); set flatten: true to bake the values in so the form is no longer editable.
POST /v1/pdf/form
{ "file": {"inline":"<b64>"}, "fields": {"name":"Ada Lovelace","subscribe":true}, "flatten": true }
extract-text
Pull text + structure out of a PDF (engine: pdf.js / unpdf). Returns per-page text plus the concatenated whole. pages is an optional 1-based range string.
POST /v1/pdf/extract-text
{ "file": {"inline":"<b64>"}, "pages": "1-5" }
Response (not an output envelope — this op returns parsed text directly):
{ "totalPages": 12, "pages": [ {"page": 1, "text": "..."} ], "text": "..." }
metadata
Read document metadata, or set any of title, author, subject, keywords, creator, producer. Read by omitting set; write by passing it. Flat per-call price. A read returns { "metadata": { ... } }; a write returns the standard output envelope with the updated PDF.
POST /v1/pdf/metadata
{ "file": {"inline":"<b64>"}, "set": {"title":"Q3 Report","author":"finance-bot"} }
encrypt
Password-protect a PDF (qpdf, 256-bit AES). Supply userPassword (the open password), ownerPassword (permissions password), or both — at least one. Neither may begin with -. Supplying only an ownerPassword leaves the open password empty: anyone can open the document, but permissions stay owner-gated (the standard PDF model).
Eight optional permission booleans — the canonical PDF permission set — restrict what an opened document allows: print, printHighRes, copy (text/graphics extraction), modify, annotate, fillForms (fill in form fields), extract (extract for accessibility), and assemble (insert/delete/rotate pages). An absent boolean means the qpdf default (allowed); pass false to restrict. Permissions are enforced by PDF viewers against the owner password.
Two notes on the set: print and printHighRes fold into one print level — print:false blocks printing entirely, print:true with printHighRes:false allows only low-resolution printing, and print:true alone allows full-quality printing. And copy is the live text/graphics extraction control; extract is the separate accessibility-extraction bit, which PDF 2.0 (the 256-bit encryption used here) deprecated and always grants — so extract:false may be a no-op on modern viewers. Use copy:false to actually block extraction.
POST /v1/pdf/encrypt
{ "file": {"inline":"<b64>"}, "userPassword": "hunter2", "ownerPassword": "admin-only",
"print": true, "printHighRes": false, "copy": false, "modify": false,
"annotate": false, "fillForms": true, "assemble": false }
Passwords ride only in the request body to the IAM-gated worker — they’re never logged.
compress
Recompress a PDF’s object and content streams (qpdf). Billed per page, not by savings — the compression ratio depends entirely on the document (an already-optimized PDF may not shrink). Set linearize: true for web “fast view” (byte-range streaming).
POST /v1/pdf/compress
{ "file": {"inline":"<b64>"}, "linearize": true }
Looking for a “linearize” / web-optimize op? It’s this flag — linearize: true on compress IS the linearization path (there is no separate pdf/linearize op).
repair
Repair a damaged PDF (qpdf). Two steps in one call: qpdf --check diagnoses the document (a findings report — broken xref, stream-length mismatches, structural warnings), then a full rewrite re-serializes it, reconstructing the cross-reference table and normalizing structure. Billed per page (compress’s rate). Returns the repaired PDF always as a storage ref plus the diagnosis:
POST /v1/pdf/repair
{ "file": {"inline":"<b64>"} }
{ "output": { "outputKey": "...", "outputUrl": "https://...", "sizeBytes": 51234, "contentType": "application/pdf", "filename": "repaired.pdf" },
"repair": { "exitCode": 2, "findings": ["xref not found", "Attempting to reconstruct cross-reference table", "..."] } }
repair.exitCode is qpdf’s --check verdict: 0 clean, 2 errors found, 3 warnings — all three still produce a rewritten output. A PDF too damaged to parse at all returns 422 PDF_PARSE_FAILED (charge reversed).
render
Rasterize PDF pages to PNG or JPEG (poppler pdftoppm). pages is an optional 1-based range (default: all); dpi ≤ 300 (default 150); format is png or jpeg. Non-embedded standard-14 fonts (Helvetica, Times, …) render correctly via a vendored DejaVu Sans substitution. Capped at 40 pages and 300 DPI — over either returns 422 pre-charge. Billed per output page.
POST /v1/pdf/render
{ "file": {"inline":"<b64>"}, "pages": "1-5", "dpi": 200, "format": "png" }
Render produces multiple images, so it returns a manifest (like split), one presigned-GET image ref per page:
{ "images": [ {"index": 0, "page": 1, "outputKey": "...", "outputUrl": "https://..."} ] }
ocr
Extract text from a scanned/image PDF (poppler pdftoppm → Tesseract LSTM). pages is an optional 1-based range (default: all). Synchronous and capped at 5 pages — the call must clear a ~30-second window, so larger asynchronous OCR jobs are coming. Over the cap returns 422 OCR_TOO_MANY_PAGES_SYNC (which advertises the cap) before any charge. Billed per source page.
lang selects the OCR language from the deployed 10-language roster: eng (default), spa, fra, deu, ita, por, nld, pol, rus, chi_sim. The live roster is the operator-tunable cputools.ocr.langs; an off-roster lang returns a free 422 UNSUPPORTED_LANG that names the roster. The same lang parameter (and roster) applies to ocr-searchable and image/ocr.
POST /v1/pdf/ocr
{ "file": {"inline":"<b64>"}, "pages": "1-3", "lang": "spa" }
Response (parsed text, not an output envelope):
{ "pages": [ {"page": 1, "text": "..."} ], "text": "..." }
ocr-searchable
Turn a scanned PDF into a searchable PDF: per page, the worker rasterizes (pdftoppm), Tesseract emits a page PDF carrying the original page image plus an invisible text layer, and qpdf merges the pages. The result looks identical but is selectable, copyable, and indexable. Same pages / lang parameters and the same 5-page synchronous cap as ocr; billed per source page at a premium over plain OCR (it runs OCR plus PDF assembly).
POST /v1/pdf/ocr-searchable
{ "file": {"inline":"<b64>"}, "pages": "1-3", "lang": "eng" }
The output is always a storage ref — { output: { outputKey, outputUrl, sizeBytes, contentType, filename } }, never inline (the render convention; OCR’d page images are large). Fetch the PDF from the presigned outputUrl, or chain outputKey straight into another op.
from-html
Generate a PDF from HTML — render caller-supplied HTML/CSS to a PDF on a headless Chromium worker. This is how you produce invoices, reports, contracts, and certificates from a template + data: build the HTML, get back a print-ready PDF. Flat per-call price.
POST /v1/pdf/from-html
{ "html": "<h1>Invoice #1042</h1>…",
"options": { "format": "A4", "margin": {"top":"2cm","bottom":"2cm"}, "printBackground": true,
"headerTemplate": "<div style='font-size:8px;width:100%;text-align:right'>Invoice #1042</div>",
"footerTemplate": "<div style='font-size:8px;width:100%;text-align:center'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>",
"pageRanges": "1-3" } }
options (all optional): format (A4 | Letter | Legal | A3 | A5 | Tabloid | Ledger), landscape (boolean), margin (top / right / bottom / left, each a size string in px, cm, mm, or in), printBackground (boolean), scale (0.1–2). Plus print furniture and output controls: headerTemplate / footerTemplate (HTML for the running header/footer — use the date, title, url, pageNumber, and totalPages classes to inject values; leave room with margin), displayHeaderFooter (boolean — auto-enabled when you supply a template, so you rarely set it yourself; set false to suppress), pageRanges (a subset to print, e.g. "1-5, 8"; default all pages), and tagged (emit a tagged/accessible PDF — default true). The HTML itself is capped at 5 MB (cputools.render.max_html_bytes, operator-tunable; over → free 422 HTML_TOO_LARGE). Returns the standard output envelope (a PDF).
Sandboxed by design. The renderer is locked read-only with no network and no JavaScript: every external resource request — <img>, <link>, <script>, fetch, @import, file:// — is blocked, so the renderer can’t reach the network (no SSRF) or execute scripts. The header/footer templates render under the same wall — they cannot fetch or execute anything either. That means assets must be inlined: data: URIs for images, inline <style> for CSS. For a chart, render it server-side and embed the PNG as a data: URI. Static, templated HTML is the sweet spot.
from-office
Convert an Office document to PDF — docx, xlsx, pptx, odt, ods, odp, rtf, txt, or csv in; a print-ready PDF out (LibreOffice via Gotenberg). Flat per-call price. The filename is required — its extension is how the format is recognized.
POST /v1/pdf/from-office
{ "file": {"inline":"<base64 docx>"}, "filename": "q3-report.docx" }
Takes the standard input source ({ "inline": "<b64>" } ≤ 4 MB, or { "inputKey": "..." } for larger files — capped at 30 MB for this op) and returns the standard output envelope (a PDF). An unsupported extension returns 422 UNSUPPORTED_FORMAT and an oversized input 422 INPUT_TOO_LARGE — both before any charge. If the conversion itself can’t be delivered (a corrupt file, an upstream failure), the call returns 422 CONVERT_FAILED and the charge is reversed: you pay only for a delivered PDF.
images
Extract the embedded images from a PDF — pull every raster image XObject out of the document and return each as a PNG. Billed per page scanned; pages is an optional 1-based range string (default: all). Useful for harvesting figures, scans, or logos a PDF carries.
POST /v1/pdf/images
{ "file": { "inline": "<base64-pdf>" }, "pages": "1-5" }
Returns a manifest: one entry per extracted image, each output a standard output envelope (PNG):
{ "images": [ {"page": 1, "index": 0, "width": 800, "height": 600, "output": { ... }} ], "count": 1, "truncated": false }
Capped at 200 images per call — extraction stops there and the response sets truncated: true.
diff
Compare two PDFs by their text — extract the text of both and return a unified diff. Per-page price, billed on the combined page count of both inputs. Pure-text compare (layout/visual diff is not in scope). context (optional, 0–100, default 3) sets the unified-diff context lines.
POST /v1/pdf/diff
{ "a": { "inline": "<base64-pdf>" }, "b": { "inputKey": "uploads/v2.pdf" }, "context": 3 }
Returns { patch, changed } — a unified-diff string (createTwoFilesPatch format) plus a boolean; changed: false when the two carry identical text.
bookmarks
Read the outline / bookmark tree of a PDF. Flat per-call price. Read-only (writing bookmarks is not in scope yet).
POST /v1/pdf/bookmarks
{ "file": { "inline": "<base64-pdf>" } }
Returns { outline, count } — a recursive tree of { title, children } plus the total node count. Empty array when the PDF has no outline.
attachments
List or extract embedded file attachments of a PDF. Flat per-call price.
POST /v1/pdf/attachments
{ "file": { "inline": "<base64-pdf>" }, "extract": true }
Returns { attachments, count } — one entry per embedded file ({ filename, size }, first 100); with extract: true, each entry also carries an output envelope with the file bytes. Empty array when the PDF carries no attachments.
verify-signatures
Inspect a PDF’s digital signatures (poppler pdfsig). Flat per-call price.
POST /v1/pdf/verify-signatures
{ "file": {"inline":"<b64>"} }
{ "signatures": [ {
"index": 1,
"signerCommonName": "Jane Doe",
"signerDistinguishedName": "CN=Jane Doe,O=Acme",
"signingTime": "Jun 11 2026 14:02:11",
"hashAlgorithm": "SHA-256",
"signatureType": "ETSI.CAdES.detached",
"signatureValidation": "Signature is Valid.",
"certificateValidation": "Certificate issuer isn't Trusted."
} ],
"signatureCount": 1 }
What it checks — read this before relying on it. This op performs structural verification + digest intactness: the signature is present, the signed byte range hasn’t been modified since signing (signatureValidation), and the signer’s identity fields are surfaced. Certificate-chain trust is not evaluated — no CA store is deployed, so certificateValidation will typically report that the issuer chain couldn’t be checked. It answers “is this document signed, by whom, and is it unmodified?” — not “do I trust the signer’s certificate authority?”. An unsigned document returns "signatures": [] with signatureCount: 0 — a normal outcome, not an error.
This op pairs with Relaystation e-sign as the verification half: send documents for legally-binding signature there, verify what came back (or any third-party-signed PDF) here.
Large files and the retry tail
For a file above the 4 MB inline ceiling, mint a presigned upload (see above), POST the bytes, and pass {"inputKey":"..."} to the worker op. The worker ops are synchronous: the API holds the connection while qpdf/poppler/Tesseract run. A worst-case job near the window (a 40-page render, a 5-page OCR) can occasionally exceed the API Gateway’s ~30-second integration timeout and return a 504. This is safe: re-issue the same request with the same Idempotency-Key and the chassis returns the already-computed result without charging twice.
Errors
402 PAYMENT_REQUIRED— no valid payment (see x402).422 PDF_PARSE_FAILED— the input didn’t parse as a PDF.422 WRONG_PASSWORD/PDF_ENCRYPTED— render/OCR an encrypted PDF (remove its password with a desktop tool first; in-platform decrypt is unavailable).422 RENDER_TOO_MANY_PAGES/DPI_TOO_HIGH/OCR_TOO_MANY_PAGES_SYNC— over a render/OCR cap (charged nothing).422 UNSUPPORTED_LANG— an OCRlangoutside the deployed roster (cputools.ocr.langs); the error names the roster (charged nothing).422 BAD_RANGE— a malformed or out-of-bounds page range.422 UNSUPPORTED_FORMAT/INPUT_TOO_LARGE/CONVERT_FAILED— from-office: extension not in the allowlist or input over the size cap (charged nothing), or the conversion couldn’t be delivered (charge reversed).400 VALIDATION_ERROR— the request body failed schema validation.
Next
Overview · Pricing · API reference · x402 wire format · Authentication