Image tools

Raster operations on sharp (libvips), plus image OCR on Tesseract. Every op is a POST https://api.relaystation.ai/v1/image/<op>.

Inputs and outputs

Same uniform shape as the PDF tools: the file is an input source — { "inline": "<base64>" } for ≤ 4 MB, or { "inputKey": "..." } (from POST /v1/cputools/upload-url) for up to 50 MB. The transformed image returns in the uniform output envelope (inline base64 ≤ 4 MB, or a presigned outputUrl above it). Supported formats: PNG, JPEG, WebP, AVIF, TIFF, GIF.

Billing

Per megapixel of the input (width × height ÷ 1,000,000), minimum 1, at the operator-tunable cputools.price.image.<op>.per_mp_micros (launch default $0.0003 / MP). metadata is flat. The live 402 challenge is authoritative.

Op	Billed on	Rate
`resize`	input megapixels	$0.0003 / MP
`convert`	input megapixels	$0.0003 / MP
`compress`	input megapixels	$0.0003 / MP
`rotate`	input megapixels	$0.0003 / MP
`crop`	input megapixels	$0.0003 / MP
`blur`	input megapixels	$0.0003 / MP
`sharpen`	input megapixels	$0.0003 / MP
`grayscale`	input megapixels	$0.0003 / MP
`exif-strip`	input megapixels	$0.0003 / MP
`composite`	base megapixels	$0.0003 / MP
`contact-sheet`	total input MP	$0.0003 / MP
`adjust`	input megapixels	$0.0003 / MP
`trim`	input megapixels	$0.0003 / MP
`extend`	input megapixels	$0.0003 / MP
`metadata`	per call	$0.0005 flat
`dominant-color`	per call	$0.0002 flat
`ocr`	per call	$0.003 flat
`from-html`	per call	$0.003 flat

from-html is flat-billed (output dimensions aren’t known until after the render, so there’s no per-MP pre-auth — the same model as pdf/from-html).

More image ops (Brief 175)

Beyond resize/convert/compress/rotate/metadata, these sharp-backed ops ship:

crop — { file, left, top, width, height }; extract a region (out-of-bounds → 422 CROP_OUT_OF_BOUNDS).
blur — { file, sigma? }; Gaussian blur.
sharpen — { file, sigma? }.
grayscale — { file }.
exif-strip — { file }; auto-orients per EXIF, then drops all metadata (pixels + orientation preserved, EXIF/GPS gone).
dominant-color — { file }; returns { dominant: { r, g, b }, hex } (flat-billed read).
composite — { file (base), overlay, gravity? | (top,left), opacity? }; overlay/watermark one image onto another.
contact-sheet — { files: <source>[], columns?, cellWidth?, cellHeight?, gap?, background? }; tile images into a thumbnail-grid PNG.
adjust — { file, brightness?, saturation?, hue?, lightness?, negate?, tint? }; tune color/tone (at least one).
trim — { file, threshold? }; auto-crop uniform-color borders.
extend — { file, top?/bottom?/left?/right?, background?, extendWith? }; pad on any side.
from-html — { html, type?, fullPage?, clip?, omitBackground?, width?, height? }; render provided HTML to an image (headless Chromium screenshot).

All transform ops return the uniform JSON output envelope and keep the input format where applicable (contact-sheet and from-html are the new-format exceptions).

resize

Resize to a target width and/or height (at least one). fit controls how the image fills the box: cover | contain | fill | inside | outside.

Set animated: true to read an animated GIF/WebP with all its frames intact — every frame is resized and the animation is preserved (billing counts the frames: pages × width × height).

POST /v1/image/resize
{ "file": {"inline":"<b64>"}, "width": 800, "fit": "inside", "animated": true }

convert

Re-encode to another format. format is the target (png | jpeg | webp | avif); quality (1–100) applies to lossy formats. AVIF is supported as an output target (modern, high-compression). Set animated: true to preserve all frames of an animated input — gif/webp targets keep the animation stack; other targets flatten.

POST /v1/image/convert
{ "file": {"inline":"<b64>"}, "format": "webp", "quality": 82 }

compress

Re-encode at lower quality in the current format.

POST /v1/image/compress
{ "file": {"inline":"<b64>"}, "quality": 70 }

rotate

Rotate by angle degrees and/or flip (vertical) / flop (horizontal).

POST /v1/image/rotate
{ "file": {"inline":"<b64>"}, "angle": 90 }

metadata

Read dimensions, format, color space, and EXIF presence — no bytes returned, flat-billed.

POST /v1/image/metadata
{ "file": {"inline":"<b64>"} }

Response: { "metadata": { "width": 1920, "height": 1080, "format": "jpeg", ... } }.

OCR

Extract text from an image (Tesseract LSTM on the cputools worker — PNG, JPEG, TIFF, BMP, WebP). Flat per-call price (cputools.price.image.ocr.flat_micros, $0.003 at launch).

lang selects the OCR language from the deployed 10-language roster: eng (default), spa, fra, deu, ita, por, nld, pol, rus, chi_sim. The live roster is the operator-tunable cputools.ocr.langs; an off-roster lang returns a free 422 UNSUPPORTED_LANG naming the roster — the same parameter and roster as the PDF OCR ops.

Pass tsv: true to also get Tesseract’s TSV output — per-word bounding boxes and confidences — alongside the plain text.

POST /v1/image/ocr
{ "file": {"inline":"<b64 image>"}, "lang": "deu", "tsv": true }

Response (JSON, no bytes): { "text": "...", "tsv": "level\tpage_num\t..." }. There is no pre-charge image-format guard — Tesseract is the parser; a non-image input returns 422 IMAGE_PARSE_FAILED with the charge reversed.

adjust

Tune color and tone in one pass (sharp’s modulate bundle). Supply at least one of: brightness (0–10 multiplier), saturation (0–10), hue (−360–360 degrees), lightness (−100–100), negate (invert), tint ({ r, g, b }). Per-MP billed; keeps the input format.

POST /v1/image/adjust
{ "file": {"inline":"<b64>"}, "brightness": 1.1, "saturation": 1.3, "tint": { "r": 255, "g": 240, "b": 200 } }

trim

Auto-crop a uniform-color border (e.g. trim the whitespace around a logo). Optional threshold (0–255) controls how close to uniform a pixel must be to be trimmed.

POST /v1/image/trim
{ "file": {"inline":"<b64>"}, "threshold": 10 }

extend

Pad an image on any side (the inverse of crop). Give pixel counts for top / bottom / left / right (at least one positive). background is a hex string ("#rrggbb") or { r, g, b, alpha? }; extendWith selects the fill mode (background | copy | repeat | mirror).

POST /v1/image/extend
{ "file": {"inline":"<b64>"}, "top": 40, "bottom": 40, "background": "#ffffff" }

from-html

Render caller-supplied HTML to a png/jpeg/webp image — the screenshot counterpart of pdf/from-html, on the same dedicated headless-Chromium render worker via page.screenshot. Use it for social cards, chart/badge rendering, or any HTML-templated graphic.

Body: { html, type?: "png"|"jpeg"|"webp", fullPage?, clip?: { x, y, width, height }, omitBackground?, width?, height? }.

Sandboxed / SSRF-walled identically to pdf/from-html: the renderer has no network and no JavaScript — every request whose scheme is not data: is aborted, so all assets must be inlined as data: URIs / inline <style> (the contract). Two pre-charge validation rules: clip and fullPage are mutually exclusive (400), and omitBackground (transparent background) requires png/webp — with jpeg it 400s (JPEG has no alpha channel).

POST /v1/image/from-html
{ "html": "<div style=\"font:48px sans-serif;padding:40px\">Hello</div>", "type": "png", "fullPage": true }

Because the image is a binary output, it rides the uniform output envelope — see Passing & receiving files for how to fetch the bytes (inline base64 ≤ 4 MB, or a presigned outputUrl above it).

Sample

curl -X POST https://api.relaystation.ai/v1/image/resize \
  -H 'X-Payment: <base64 EIP-3009 auth>' \
  -H 'Idempotency-Key: thumb-20260606' \
  -H 'Content-Type: application/json' \
  -d '{"file":{"inline":"<base64 image>"},"width":400}'

Errors

402 PAYMENT_REQUIRED — no valid payment.
422 IMAGE_PARSE_FAILED — the input didn’t decode as a supported image.
422 IMAGE_TOO_LARGE — the input exceeds cputools.image.max_megapixels.
422 UNSUPPORTED_LANG — an OCR lang outside the deployed roster (cputools.ocr.langs); the error names the roster (charged nothing).
422 HTML_TOO_LARGE — from-html input over cputools.render.max_html_bytes.
400 VALIDATION_ERROR — the body failed schema validation (incl. from-html’s clip+fullPage and omitBackground+jpeg conflicts).

PDF tools · CSV tools · QR & barcode · Pricing · API reference