
# Image tools

Raster operations on [sharp](https://sharp.pixelplumbing.com/) (libvips), plus image OCR on Tesseract. Every op is a `POST https://api.relaystation.ai/v1/image/<op>`.

## Inputs and outputs

Same uniform shape as the [PDF tools](/docs/pdf-tools): the `file` is an **input source** — `{ "inline": "<base64>" }` for ≤ 4 MB, or `{ "inputKey": "..." }` (from `POST /v1/cputools/upload-url`) for up to 50 MB. The transformed image returns in the uniform **output envelope** (inline base64 ≤ 4 MB, or a presigned `outputUrl` above it). Supported formats: PNG, JPEG, WebP, AVIF, TIFF, GIF.

## Billing

Per **megapixel of the input** (width × height ÷ 1,000,000), **minimum 1**, at the operator-tunable `cputools.price.image.<op>.per_mp_micros` (launch default $0.0003 / MP). `metadata` is flat. The live `402` challenge is authoritative.

| Op | Billed on | Rate |
|---|---|---|
| `resize` | input megapixels | $0.0003 / MP |
| `convert` | input megapixels | $0.0003 / MP |
| `compress` | input megapixels | $0.0003 / MP |
| `rotate` | input megapixels | $0.0003 / MP |
| `crop` | input megapixels | $0.0003 / MP |
| `blur` | input megapixels | $0.0003 / MP |
| `sharpen` | input megapixels | $0.0003 / MP |
| `grayscale` | input megapixels | $0.0003 / MP |
| `exif-strip` | input megapixels | $0.0003 / MP |
| `composite` | base megapixels | $0.0003 / MP |
| `contact-sheet` | total input MP | $0.0003 / MP |
| `adjust` | input megapixels | $0.0003 / MP |
| `trim` | input megapixels | $0.0003 / MP |
| `extend` | input megapixels | $0.0003 / MP |
| `metadata` | per call | $0.0005 flat |
| `dominant-color` | per call | $0.0002 flat |
| `ocr` | per call | $0.003 flat |
| `from-html` | per call | $0.003 flat |

`from-html` is **flat-billed** (output dimensions aren't known until after the render, so there's no per-MP pre-auth — the same model as [`pdf/from-html`](/docs/pdf-tools)).

## More image ops (Brief 175)

Beyond resize/convert/compress/rotate/metadata, these sharp-backed ops ship:

- **`crop`** — `{ file, left, top, width, height }`; extract a region (out-of-bounds → `422 CROP_OUT_OF_BOUNDS`).
- **`blur`** — `{ file, sigma? }`; Gaussian blur.
- **`sharpen`** — `{ file, sigma? }`.
- **`grayscale`** — `{ file }`.
- **`exif-strip`** — `{ file }`; auto-orients per EXIF, then drops all metadata (pixels + orientation preserved, EXIF/GPS gone).
- **`dominant-color`** — `{ file }`; returns `{ dominant: { r, g, b }, hex }` (flat-billed read).
- **`composite`** — `{ file (base), overlay, gravity? | (top,left), opacity? }`; overlay/watermark one image onto another.
- **`contact-sheet`** — `{ files: <source>[], columns?, cellWidth?, cellHeight?, gap?, background? }`; tile images into a thumbnail-grid PNG.
- **`adjust`** — `{ file, brightness?, saturation?, hue?, lightness?, negate?, tint? }`; tune color/tone (at least one).
- **`trim`** — `{ file, threshold? }`; auto-crop uniform-color borders.
- **`extend`** — `{ file, top?/bottom?/left?/right?, background?, extendWith? }`; pad on any side.
- **`from-html`** — `{ html, type?, fullPage?, clip?, omitBackground?, width?, height? }`; render provided HTML to an image (headless Chromium screenshot).

All transform ops return the uniform JSON output envelope and keep the input format where applicable (`contact-sheet` and `from-html` are the new-format exceptions).

## resize

Resize to a target `width` and/or `height` (at least one). `fit` controls how the image fills the box: `cover` | `contain` | `fill` | `inside` | `outside`.

Set `animated: true` to read an animated GIF/WebP with all its frames intact — every frame is resized and the animation is preserved (billing counts the frames: pages × width × height).

```json
POST /v1/image/resize
{ "file": {"inline":"<b64>"}, "width": 800, "fit": "inside", "animated": true }
```

## convert

Re-encode to another format. `format` is the target (`png` | `jpeg` | `webp` | `avif`); `quality` (1–100) applies to lossy formats. **AVIF** is supported as an output target (modern, high-compression). Set `animated: true` to preserve all frames of an animated input — `gif`/`webp` targets keep the animation stack; other targets flatten.

```json
POST /v1/image/convert
{ "file": {"inline":"<b64>"}, "format": "webp", "quality": 82 }
```

## compress

Re-encode at lower quality in the current format.

```json
POST /v1/image/compress
{ "file": {"inline":"<b64>"}, "quality": 70 }
```

## rotate

Rotate by `angle` degrees and/or `flip` (vertical) / `flop` (horizontal).

```json
POST /v1/image/rotate
{ "file": {"inline":"<b64>"}, "angle": 90 }
```

## metadata

Read dimensions, format, color space, and EXIF presence — no bytes returned, flat-billed.

```json
POST /v1/image/metadata
{ "file": {"inline":"<b64>"} }
```

Response: `{ "metadata": { "width": 1920, "height": 1080, "format": "jpeg", ... } }`.

## OCR

Extract text from an image (Tesseract LSTM on the cputools worker — PNG, JPEG, TIFF, BMP, WebP). Flat per-call price (`cputools.price.image.ocr.flat_micros`, $0.003 at launch).

`lang` selects the OCR language from the **deployed 10-language roster**: `eng` (default), `spa`, `fra`, `deu`, `ita`, `por`, `nld`, `pol`, `rus`, `chi_sim`. The live roster is the operator-tunable `cputools.ocr.langs`; an off-roster `lang` returns a **free** `422 UNSUPPORTED_LANG` naming the roster — the same parameter and roster as the [PDF OCR ops](/docs/pdf-tools).

Pass `tsv: true` to also get Tesseract's TSV output — per-word bounding boxes and confidences — alongside the plain text.

```json
POST /v1/image/ocr
{ "file": {"inline":"<b64 image>"}, "lang": "deu", "tsv": true }
```

Response (JSON, no bytes): `{ "text": "...", "tsv": "level\tpage_num\t..." }`. There is no pre-charge image-format guard — Tesseract is the parser; a non-image input returns `422 IMAGE_PARSE_FAILED` with the charge reversed.

## adjust

Tune color and tone in one pass (sharp's modulate bundle). Supply at least one of: `brightness` (0–10 multiplier), `saturation` (0–10), `hue` (−360–360 degrees), `lightness` (−100–100), `negate` (invert), `tint` (`{ r, g, b }`). Per-MP billed; keeps the input format.

```json
POST /v1/image/adjust
{ "file": {"inline":"<b64>"}, "brightness": 1.1, "saturation": 1.3, "tint": { "r": 255, "g": 240, "b": 200 } }
```

## trim

Auto-crop a uniform-color border (e.g. trim the whitespace around a logo). Optional `threshold` (0–255) controls how close to uniform a pixel must be to be trimmed.

```json
POST /v1/image/trim
{ "file": {"inline":"<b64>"}, "threshold": 10 }
```

## extend

Pad an image on any side (the inverse of `crop`). Give pixel counts for `top` / `bottom` / `left` / `right` (at least one positive). `background` is a hex string (`"#rrggbb"`) or `{ r, g, b, alpha? }`; `extendWith` selects the fill mode (`background` | `copy` | `repeat` | `mirror`).

```json
POST /v1/image/extend
{ "file": {"inline":"<b64>"}, "top": 40, "bottom": 40, "background": "#ffffff" }
```

## from-html

Render caller-supplied HTML to a **png/jpeg/webp** image — the screenshot counterpart of [`pdf/from-html`](/docs/pdf-tools), on the same dedicated headless-Chromium render worker via `page.screenshot`. Use it for social cards, chart/badge rendering, or any HTML-templated graphic.

Body: `{ html, type?: "png"|"jpeg"|"webp", fullPage?, clip?: { x, y, width, height }, omitBackground?, width?, height? }`.

**Sandboxed / SSRF-walled** identically to `pdf/from-html`: the renderer has **no network** and **no JavaScript** — every request whose scheme is not `data:` is aborted, so all assets must be inlined as `data:` URIs / inline `<style>` (the contract). Two pre-charge validation rules: `clip` and `fullPage` are mutually exclusive (`400`), and `omitBackground` (transparent background) requires `png`/`webp` — with `jpeg` it `400`s (JPEG has no alpha channel).

```json
POST /v1/image/from-html
{ "html": "<div style=\"font:48px sans-serif;padding:40px\">Hello</div>", "type": "png", "fullPage": true }
```

Because the image is a binary output, it rides the uniform output envelope — see [receiving outputs](/docs/receiving-outputs) and [persistence tiers](/docs/persistence-tiers) for how to fetch the bytes (inline base64 ≤ 4 MB, or a presigned `outputUrl` above it).

## Sample

```bash
curl -X POST https://api.relaystation.ai/v1/image/resize \
  -H 'X-Payment: <base64 EIP-3009 auth>' \
  -H 'Idempotency-Key: thumb-20260606' \
  -H 'Content-Type: application/json' \
  -d '{"file":{"inline":"<base64 image>"},"width":400}'
```

## Errors

- `402 PAYMENT_REQUIRED` — no valid payment.
- `422 IMAGE_PARSE_FAILED` — the input didn't decode as a supported image.
- `422 IMAGE_TOO_LARGE` — the input exceeds `cputools.image.max_megapixels`.
- `422 UNSUPPORTED_LANG` — an OCR `lang` outside the deployed roster (`cputools.ocr.langs`); the error names the roster (charged nothing).
- `422 HTML_TOO_LARGE` — `from-html` input over `cputools.render.max_html_bytes`.
- `400 VALIDATION_ERROR` — the body failed schema validation (incl. `from-html`'s `clip`+`fullPage` and `omitBackground`+`jpeg` conflicts).

## Next

[PDF tools](/docs/pdf-tools) · [CSV tools](/docs/csv-tools) · [QR & barcode](/docs/qr-tools) · [Pricing](/pricing) · [API reference](/api-reference)
