# Image tools Raster operations on [sharp](https://sharp.pixelplumbing.com/) (libvips), plus image OCR on Tesseract. Every op is a `POST https://api.relaystation.ai/v1/image/`. ## Inputs and outputs Same uniform shape as the [PDF tools](/docs/pdf-tools): the `file` is an **input source** — `{ "inline": "" }` for ≤ 4 MB, or `{ "inputKey": "..." }` (from `POST /v1/cputools/upload-url`) for up to 50 MB. The transformed image returns in the uniform **output envelope** (inline base64 ≤ 4 MB, or a presigned `outputUrl` above it). Supported formats: PNG, JPEG, WebP, AVIF, TIFF, GIF. ## Billing Per **megapixel of the input** (width × height ÷ 1,000,000), **minimum 1**, at the operator-tunable `cputools.price.image..per_mp_micros` (launch default $0.0003 / MP). `metadata` is flat. The live `402` challenge is authoritative. | Op | Billed on | Rate | |---|---|---| | `resize` | input megapixels | $0.0003 / MP | | `convert` | input megapixels | $0.0003 / MP | | `compress` | input megapixels | $0.0003 / MP | | `rotate` | input megapixels | $0.0003 / MP | | `crop` | input megapixels | $0.0003 / MP | | `blur` | input megapixels | $0.0003 / MP | | `sharpen` | input megapixels | $0.0003 / MP | | `grayscale` | input megapixels | $0.0003 / MP | | `exif-strip` | input megapixels | $0.0003 / MP | | `composite` | base megapixels | $0.0003 / MP | | `contact-sheet` | total input MP | $0.0003 / MP | | `adjust` | input megapixels | $0.0003 / MP | | `trim` | input megapixels | $0.0003 / MP | | `extend` | input megapixels | $0.0003 / MP | | `metadata` | per call | $0.0005 flat | | `dominant-color` | per call | $0.0002 flat | | `ocr` | per call | $0.003 flat | | `from-html` | per call | $0.003 flat | `from-html` is **flat-billed** (output dimensions aren't known until after the render, so there's no per-MP pre-auth — the same model as [`pdf/from-html`](/docs/pdf-tools)). ## More image ops (Brief 175) Beyond resize/convert/compress/rotate/metadata, these sharp-backed ops ship: - **`crop`** — `{ file, left, top, width, height }`; extract a region (out-of-bounds → `422 CROP_OUT_OF_BOUNDS`). - **`blur`** — `{ file, sigma? }`; Gaussian blur. - **`sharpen`** — `{ file, sigma? }`. - **`grayscale`** — `{ file }`. - **`exif-strip`** — `{ file }`; auto-orients per EXIF, then drops all metadata (pixels + orientation preserved, EXIF/GPS gone). - **`dominant-color`** — `{ file }`; returns `{ dominant: { r, g, b }, hex }` (flat-billed read). - **`composite`** — `{ file (base), overlay, gravity? | (top,left), opacity? }`; overlay/watermark one image onto another. - **`contact-sheet`** — `{ files: [], columns?, cellWidth?, cellHeight?, gap?, background? }`; tile images into a thumbnail-grid PNG. - **`adjust`** — `{ file, brightness?, saturation?, hue?, lightness?, negate?, tint? }`; tune color/tone (at least one). - **`trim`** — `{ file, threshold? }`; auto-crop uniform-color borders. - **`extend`** — `{ file, top?/bottom?/left?/right?, background?, extendWith? }`; pad on any side. - **`from-html`** — `{ html, type?, fullPage?, clip?, omitBackground?, width?, height? }`; render provided HTML to an image (headless Chromium screenshot). All transform ops return the uniform JSON output envelope and keep the input format where applicable (`contact-sheet` and `from-html` are the new-format exceptions). ## resize Resize to a target `width` and/or `height` (at least one). `fit` controls how the image fills the box: `cover` | `contain` | `fill` | `inside` | `outside`. Set `animated: true` to read an animated GIF/WebP with all its frames intact — every frame is resized and the animation is preserved (billing counts the frames: pages × width × height). ```json POST /v1/image/resize { "file": {"inline":""}, "width": 800, "fit": "inside", "animated": true } ``` ## convert Re-encode to another format. `format` is the target (`png` | `jpeg` | `webp` | `avif`); `quality` (1–100) applies to lossy formats. **AVIF** is supported as an output target (modern, high-compression). Set `animated: true` to preserve all frames of an animated input — `gif`/`webp` targets keep the animation stack; other targets flatten. ```json POST /v1/image/convert { "file": {"inline":""}, "format": "webp", "quality": 82 } ``` ## compress Re-encode at lower quality in the current format. ```json POST /v1/image/compress { "file": {"inline":""}, "quality": 70 } ``` ## rotate Rotate by `angle` degrees and/or `flip` (vertical) / `flop` (horizontal). ```json POST /v1/image/rotate { "file": {"inline":""}, "angle": 90 } ``` ## metadata Read dimensions, format, color space, and EXIF presence — no bytes returned, flat-billed. ```json POST /v1/image/metadata { "file": {"inline":""} } ``` Response: `{ "metadata": { "width": 1920, "height": 1080, "format": "jpeg", ... } }`. ## OCR Extract text from an image (Tesseract LSTM on the cputools worker — PNG, JPEG, TIFF, BMP, WebP). Flat per-call price (`cputools.price.image.ocr.flat_micros`, $0.003 at launch). `lang` selects the OCR language from the **deployed 10-language roster**: `eng` (default), `spa`, `fra`, `deu`, `ita`, `por`, `nld`, `pol`, `rus`, `chi_sim`. The live roster is the operator-tunable `cputools.ocr.langs`; an off-roster `lang` returns a **free** `422 UNSUPPORTED_LANG` naming the roster — the same parameter and roster as the [PDF OCR ops](/docs/pdf-tools). Pass `tsv: true` to also get Tesseract's TSV output — per-word bounding boxes and confidences — alongside the plain text. ```json POST /v1/image/ocr { "file": {"inline":""}, "lang": "deu", "tsv": true } ``` Response (JSON, no bytes): `{ "text": "...", "tsv": "level\tpage_num\t..." }`. There is no pre-charge image-format guard — Tesseract is the parser; a non-image input returns `422 IMAGE_PARSE_FAILED` with the charge reversed. ## adjust Tune color and tone in one pass (sharp's modulate bundle). Supply at least one of: `brightness` (0–10 multiplier), `saturation` (0–10), `hue` (−360–360 degrees), `lightness` (−100–100), `negate` (invert), `tint` (`{ r, g, b }`). Per-MP billed; keeps the input format. ```json POST /v1/image/adjust { "file": {"inline":""}, "brightness": 1.1, "saturation": 1.3, "tint": { "r": 255, "g": 240, "b": 200 } } ``` ## trim Auto-crop a uniform-color border (e.g. trim the whitespace around a logo). Optional `threshold` (0–255) controls how close to uniform a pixel must be to be trimmed. ```json POST /v1/image/trim { "file": {"inline":""}, "threshold": 10 } ``` ## extend Pad an image on any side (the inverse of `crop`). Give pixel counts for `top` / `bottom` / `left` / `right` (at least one positive). `background` is a hex string (`"#rrggbb"`) or `{ r, g, b, alpha? }`; `extendWith` selects the fill mode (`background` | `copy` | `repeat` | `mirror`). ```json POST /v1/image/extend { "file": {"inline":""}, "top": 40, "bottom": 40, "background": "#ffffff" } ``` ## from-html Render caller-supplied HTML to a **png/jpeg/webp** image — the screenshot counterpart of [`pdf/from-html`](/docs/pdf-tools), on the same dedicated headless-Chromium render worker via `page.screenshot`. Use it for social cards, chart/badge rendering, or any HTML-templated graphic. Body: `{ html, type?: "png"|"jpeg"|"webp", fullPage?, clip?: { x, y, width, height }, omitBackground?, width?, height? }`. **Sandboxed / SSRF-walled** identically to `pdf/from-html`: the renderer has **no network** and **no JavaScript** — every request whose scheme is not `data:` is aborted, so all assets must be inlined as `data:` URIs / inline `