Image tools
Raster operations on sharp (libvips), plus image OCR on Tesseract. Every op is a POST https://api.relaystation.ai/v1/image/<op>.
Inputs and outputs
Same uniform shape as the PDF tools: the file is an input source — { "inline": "<base64>" } for ≤ 4 MB, or { "inputKey": "..." } (from POST /v1/cputools/upload-url) for up to 50 MB. The transformed image returns in the uniform output envelope (inline base64 ≤ 4 MB, or a presigned outputUrl above it). Supported formats: PNG, JPEG, WebP, AVIF, TIFF, GIF.
Billing
Per megapixel of the input (width × height ÷ 1,000,000), minimum 1, at the operator-tunable cputools.price.image.<op>.per_mp_micros (launch default $0.0003 / MP). metadata is flat. The live 402 challenge is authoritative.
| Op | Billed on | Rate |
|---|---|---|
resize | input megapixels | $0.0003 / MP |
convert | input megapixels | $0.0003 / MP |
compress | input megapixels | $0.0003 / MP |
rotate | input megapixels | $0.0003 / MP |
crop | input megapixels | $0.0003 / MP |
blur | input megapixels | $0.0003 / MP |
sharpen | input megapixels | $0.0003 / MP |
grayscale | input megapixels | $0.0003 / MP |
exif-strip | input megapixels | $0.0003 / MP |
composite | base megapixels | $0.0003 / MP |
contact-sheet | total input MP | $0.0003 / MP |
adjust | input megapixels | $0.0003 / MP |
trim | input megapixels | $0.0003 / MP |
extend | input megapixels | $0.0003 / MP |
metadata | per call | $0.0005 flat |
dominant-color | per call | $0.0002 flat |
ocr | per call | $0.003 flat |
from-html | per call | $0.003 flat |
from-html is flat-billed (output dimensions aren’t known until after the render, so there’s no per-MP pre-auth — the same model as pdf/from-html).
More image ops (Brief 175)
Beyond resize/convert/compress/rotate/metadata, these sharp-backed ops ship:
crop—{ file, left, top, width, height }; extract a region (out-of-bounds →422 CROP_OUT_OF_BOUNDS).blur—{ file, sigma? }; Gaussian blur.sharpen—{ file, sigma? }.grayscale—{ file }.exif-strip—{ file }; auto-orients per EXIF, then drops all metadata (pixels + orientation preserved, EXIF/GPS gone).dominant-color—{ file }; returns{ dominant: { r, g, b }, hex }(flat-billed read).composite—{ file (base), overlay, gravity? | (top,left), opacity? }; overlay/watermark one image onto another.contact-sheet—{ files: <source>[], columns?, cellWidth?, cellHeight?, gap?, background? }; tile images into a thumbnail-grid PNG.adjust—{ file, brightness?, saturation?, hue?, lightness?, negate?, tint? }; tune color/tone (at least one).trim—{ file, threshold? }; auto-crop uniform-color borders.extend—{ file, top?/bottom?/left?/right?, background?, extendWith? }; pad on any side.from-html—{ html, type?, fullPage?, clip?, omitBackground?, width?, height? }; render provided HTML to an image (headless Chromium screenshot).
All transform ops return the uniform JSON output envelope and keep the input format where applicable (contact-sheet and from-html are the new-format exceptions).
resize
Resize to a target width and/or height (at least one). fit controls how the image fills the box: cover | contain | fill | inside | outside.
Set animated: true to read an animated GIF/WebP with all its frames intact — every frame is resized and the animation is preserved (billing counts the frames: pages × width × height).
POST /v1/image/resize
{ "file": {"inline":"<b64>"}, "width": 800, "fit": "inside", "animated": true }
convert
Re-encode to another format. format is the target (png | jpeg | webp | avif); quality (1–100) applies to lossy formats. AVIF is supported as an output target (modern, high-compression). Set animated: true to preserve all frames of an animated input — gif/webp targets keep the animation stack; other targets flatten.
POST /v1/image/convert
{ "file": {"inline":"<b64>"}, "format": "webp", "quality": 82 }
compress
Re-encode at lower quality in the current format.
POST /v1/image/compress
{ "file": {"inline":"<b64>"}, "quality": 70 }
rotate
Rotate by angle degrees and/or flip (vertical) / flop (horizontal).
POST /v1/image/rotate
{ "file": {"inline":"<b64>"}, "angle": 90 }
metadata
Read dimensions, format, color space, and EXIF presence — no bytes returned, flat-billed.
POST /v1/image/metadata
{ "file": {"inline":"<b64>"} }
Response: { "metadata": { "width": 1920, "height": 1080, "format": "jpeg", ... } }.
OCR
Extract text from an image (Tesseract LSTM on the cputools worker — PNG, JPEG, TIFF, BMP, WebP). Flat per-call price (cputools.price.image.ocr.flat_micros, $0.003 at launch).
lang selects the OCR language from the deployed 10-language roster: eng (default), spa, fra, deu, ita, por, nld, pol, rus, chi_sim. The live roster is the operator-tunable cputools.ocr.langs; an off-roster lang returns a free 422 UNSUPPORTED_LANG naming the roster — the same parameter and roster as the PDF OCR ops.
Pass tsv: true to also get Tesseract’s TSV output — per-word bounding boxes and confidences — alongside the plain text.
POST /v1/image/ocr
{ "file": {"inline":"<b64 image>"}, "lang": "deu", "tsv": true }
Response (JSON, no bytes): { "text": "...", "tsv": "level\tpage_num\t..." }. There is no pre-charge image-format guard — Tesseract is the parser; a non-image input returns 422 IMAGE_PARSE_FAILED with the charge reversed.
adjust
Tune color and tone in one pass (sharp’s modulate bundle). Supply at least one of: brightness (0–10 multiplier), saturation (0–10), hue (−360–360 degrees), lightness (−100–100), negate (invert), tint ({ r, g, b }). Per-MP billed; keeps the input format.
POST /v1/image/adjust
{ "file": {"inline":"<b64>"}, "brightness": 1.1, "saturation": 1.3, "tint": { "r": 255, "g": 240, "b": 200 } }
trim
Auto-crop a uniform-color border (e.g. trim the whitespace around a logo). Optional threshold (0–255) controls how close to uniform a pixel must be to be trimmed.
POST /v1/image/trim
{ "file": {"inline":"<b64>"}, "threshold": 10 }
extend
Pad an image on any side (the inverse of crop). Give pixel counts for top / bottom / left / right (at least one positive). background is a hex string ("#rrggbb") or { r, g, b, alpha? }; extendWith selects the fill mode (background | copy | repeat | mirror).
POST /v1/image/extend
{ "file": {"inline":"<b64>"}, "top": 40, "bottom": 40, "background": "#ffffff" }
from-html
Render caller-supplied HTML to a png/jpeg/webp image — the screenshot counterpart of pdf/from-html, on the same dedicated headless-Chromium render worker via page.screenshot. Use it for social cards, chart/badge rendering, or any HTML-templated graphic.
Body: { html, type?: "png"|"jpeg"|"webp", fullPage?, clip?: { x, y, width, height }, omitBackground?, width?, height? }.
Sandboxed / SSRF-walled identically to pdf/from-html: the renderer has no network and no JavaScript — every request whose scheme is not data: is aborted, so all assets must be inlined as data: URIs / inline <style> (the contract). Two pre-charge validation rules: clip and fullPage are mutually exclusive (400), and omitBackground (transparent background) requires png/webp — with jpeg it 400s (JPEG has no alpha channel).
POST /v1/image/from-html
{ "html": "<div style=\"font:48px sans-serif;padding:40px\">Hello</div>", "type": "png", "fullPage": true }
Because the image is a binary output, it rides the uniform output envelope — see Passing & receiving files for how to fetch the bytes (inline base64 ≤ 4 MB, or a presigned outputUrl above it).
Sample
curl -X POST https://api.relaystation.ai/v1/image/resize \
-H 'X-Payment: <base64 EIP-3009 auth>' \
-H 'Idempotency-Key: thumb-20260606' \
-H 'Content-Type: application/json' \
-d '{"file":{"inline":"<base64 image>"},"width":400}'
Errors
402 PAYMENT_REQUIRED— no valid payment.422 IMAGE_PARSE_FAILED— the input didn’t decode as a supported image.422 IMAGE_TOO_LARGE— the input exceedscputools.image.max_megapixels.422 UNSUPPORTED_LANG— an OCRlangoutside the deployed roster (cputools.ocr.langs); the error names the roster (charged nothing).422 HTML_TOO_LARGE—from-htmlinput overcputools.render.max_html_bytes.400 VALIDATION_ERROR— the body failed schema validation (incl.from-html’sclip+fullPageandomitBackground+jpegconflicts).
Next
PDF tools · CSV tools · QR & barcode · Pricing · API reference