Data tools
Nineteen in-process dataframe operations over tabular data — the transform half of ETL, including Excel in/out (from-xlsx / to-xlsx, the format DuckDB can’t emit) and structured derive / cleaning ops. Each is a POST https://api.relaystation.ai/v1/data/<op>. Input is the csv-family union (csv, tsv, json array-of-rows, ndjson); pass from/to to pick formats (default csv). cputools reshapes your rows — it never evaluates a cell or a user expression as code (see filter below), so there is no expression-execution surface. For full SQL, see Data — SQL; to chain several of these in one call, see Pipeline.
Inputs and outputs
Same uniform shape as the CSV tools: the file is an input source — { "inline": "<base64>" } for ≤ 4 MB, or { "inputKey": "..." } (from POST /v1/cputools/upload-url) for up to 50 MB. The verbs return the uniform output envelope (the reshaped table, default text/csv); profile returns a JSON report.
Billing
Per-MB of input, minimum 1 MiB, at the operator-tunable cputools.price.data.<op>.per_mb_micros (launch default $0.0002 / MB). join bills on the sum of both inputs. The live 402 challenge is authoritative.
| Op | Billed on | Rate |
|---|---|---|
filter | input MB | $0.0002 / MB |
sort | input MB | $0.0002 / MB |
groupby | input MB | $0.0002 / MB |
join | input MB (both inputs) | $0.0002 / MB |
union | input MB (all inputs) | $0.0002 / MB |
profile | input MB | $0.0002 / MB |
cast | input MB | $0.0002 / MB |
pivot | input MB | $0.0002 / MB |
schema-infer | input MB | $0.0002 / MB |
validate | input MB | $0.0002 / MB |
diff | input MB (both inputs) | $0.0002 / MB |
from-xlsx | input MB | $0.0002 / MB |
to-xlsx | input MB | $0.0002 / MB |
derive | input MB | $0.0002 / MB |
fillna | input MB | $0.0002 / MB |
dropna | input MB | $0.0002 / MB |
rename | input MB | $0.0002 / MB |
slice | input MB | $0.0002 / MB |
explode | input MB | $0.0002 / MB |
filter
Keep rows matching a structured predicate — a tree of { column, op, value } leaves combined with and / or / not. Ops: eq, ne, gt, gte, lt, lte, contains, startsWith, endsWith, in, isNull, notNull. Comparisons are value-typed (numeric when both sides parse as numbers, else string). There is no expression string — a value is always a literal compared against your cell data, never executed — so the predicate is safe by construction.
POST /v1/data/filter
{ "file": {"inline":"<b64>"},
"where": { "and": [ {"column":"status","op":"eq","value":"active"},
{"column":"age","op":"gte","value":18} ] } }
sort
Sort by one or more columns. Each key is { column, dir?: "asc"|"desc", type?: "string"|"number" }.
POST /v1/data/sort
{ "file": {"inline":"<b64>"}, "by": [ {"column":"age","dir":"desc","type":"number"} ] }
groupby
Group by one or more columns and apply named aggregate functions — count, sum, avg, min, max, first, last. count needs no column; the rest require one. Name an aggregate with as.
POST /v1/data/groupby
{ "file": {"inline":"<b64>"}, "by": ["country"],
"aggregate": [ {"fn":"count","as":"n"}, {"column":"spend","fn":"sum","as":"total"} ] }
join
Join two tables on one or more key pairs. how is inner (default), left, right, or outer. Pass left and right as input sources and on as [{ "left": "...", "right": "..." }].
POST /v1/data/join
{ "left": {"inline":"<b64>"}, "right": {"inputKey":"..."},
"on": [ {"left":"id","right":"user_id"} ], "how": "left" }
union
Concatenate two or more tables, preserving row order; the output schema is the ordered union of all columns (missing cells come back empty).
POST /v1/data/union
{ "files": [ {"inline":"<b64>"}, {"inline":"<b64>"} ] }
profile
A read-only data-quality report: per column the inferred type, row/null counts, distinct count, min/max, numeric mean, and the top values. Returns JSON.
POST /v1/data/profile
{ "file": {"inline":"<b64>"} }
cast
Coerce named columns to a target type — string, integer, number, boolean, or date. Best-effort: a value that can’t be coerced becomes null (the cell isn’t dropped).
POST /v1/data/cast
{ "file": {"inline":"<b64>"}, "columns": [ {"column":"age","to":"integer"}, {"column":"joined","to":"date"} ] }
pivot
Reshape between long and wide. mode: "pivot" spreads a column’s values into new columns — { index: [...], columns: "<col>", values: "<col>", agg?: count|sum|avg|min|max }. mode: "unpivot" melts columns into key/value rows — { id: [...], value?: [...] }.
POST /v1/data/pivot
{ "file": {"inline":"<b64>"}, "mode": "pivot",
"index": ["country"], "columns": "quarter", "values": "spend", "agg": "sum" }
schema-infer
Infer each column’s type from the data and emit a schema. dialect: "json-schema" (default) returns a JSON Schema; dialect: "sql-ddl" returns a CREATE TABLE statement. Returns JSON/text.
POST /v1/data/schema-infer
{ "file": {"inline":"<b64>"}, "dialect": "sql-ddl" }
validate
Check each row against a structured schema — per column { type, required?, min?, max?, pattern? }. pattern is a bounded, length-capped regular expression matched against the cell string (it is not executable code). Returns { valid, rowCount, errorCount, errors: [{row, column, reason}] }.
POST /v1/data/validate
{ "file": {"inline":"<b64>"},
"schema": { "email": {"type":"string","required":true,"pattern":"^[^@]+@[^@]+$"},
"age": {"type":"integer","min":0,"max":120} } }
diff
Row-level changeset between two tables. Pass left and right as input sources and optional key columns (omit to compare whole rows). Returns { added, removed, changed: [{key, before, after}] }.
POST /v1/data/diff
{ "left": {"inline":"<b64>"}, "right": {"inputKey":"..."}, "key": ["id"] }
from-xlsx
Parse an Excel workbook into a table. sheet (name or index, default the first) picks the sheet; to is the output format. The first row is the header, and cell values are read — formulas are never evaluated. A workbook over the cell-count cap is rejected pre-charge.
POST /v1/data/from-xlsx
{ "file": {"inputKey":"..."}, "sheet": 0, "to": "json" }
to-xlsx
Write a table to an Excel .xlsx. Any cell whose value starts with = + - @ is written as text, so a payload can’t become an active formula when the file is opened — formula-injection-safe, and it round-trips cleanly.
POST /v1/data/to-xlsx
{ "file": {"inline":"<b64 csv>"}, "sheetName": "Export" }
Multi-sheet workbooks. Instead of a single file, pass a sheets array (≤ 20) — each entry is { name, file, from? } with its own tabular source, written to its own tab. Provide exactly one of file or sheets. Multi-sheet billing sums every sheet’s input bytes.
Styling. An optional style object applies to the workbook: headerBold, freezeHeader, columnWidths (an array by index or an object keyed by column name), and numberFormats (per-column Excel format strings like "#,##0.00", "yyyy-mm-dd", "@").
POST /v1/data/to-xlsx
{ "sheets": [
{ "name": "Sales", "file": {"inline":"<b64 csv>"} },
{ "name": "Costs", "file": {"inputKey":"..."}, "from": "json" }
],
"style": { "headerBold": true, "freezeHeader": true, "numberFormats": { "amount": "#,##0.00" } } }
The result is binary .xlsx in the uniform output envelope — see receiving outputs and persistence tiers.
derive
Add computed columns from a structured expression — no formula string, no eval. Each column is { name, expr } where expr is {col} | {lit} | {op, args}: arithmetic (add/sub/mul/div/mod/abs/round), string (concat/upper/lower/trim/substring), coalesce, and comparisons. Values are literals; ops are an allowlist; nothing is executed.
POST /v1/data/derive
{ "file": {"inline":"<b64>"},
"columns": [ {"name":"total","expr":{"op":"mul","args":[{"col":"qty"},{"col":"price"}]}} ] }
fillna
Fill empty / null cells. value is a literal, or one of ffill / bfill / mean / median / mode. Restrict to columns, or apply to all.
POST /v1/data/fillna
{ "file": {"inline":"<b64>"}, "columns": ["spend"], "value": 0 }
dropna
Drop rows with empty cells. how: "any" (default) drops a row if any of the named columns is empty; "all" only if all are.
POST /v1/data/dropna
{ "file": {"inline":"<b64>"}, "columns": ["email"], "how": "any" }
rename
Rename columns. Each pair is { from, to }; a reserved or duplicate target → 422.
POST /v1/data/rename
{ "file": {"inline":"<b64>"}, "columns": [ {"from":"col1","to":"name"} ] }
slice
Row slicing / sampling — offset + limit, head (first n), tail (last n), or sample (random n).
POST /v1/data/slice
{ "file": {"inline":"<b64>"}, "head": 100 }
explode
Unnest one cell’s array/list into multiple rows (one per element). Default reads a JSON array; pass separator to split a delimited string instead.
POST /v1/data/explode
{ "file": {"inline":"<b64>"}, "column": "tags" }
sample
Sample rows from a dataset — pass exactly one of n (an exact count) or fraction (0–1, a share of the rows). method is random (default), head (first n), or systematic (every step-th row). Pass a seed for reproducible random draws — the same seed over the same input returns a byte-identical sample. The result preserves original row order. Billed per MB of input.
POST /v1/data/sample
{ "file": {"inline":"<b64>"}, "fraction": 0.1, "seed": 42 }
(data/slice also has a bare sample parameter for a one-off random count; data/sample is the richer tool — fractions, methods, and the reproducible seed.)
Sample
curl -X POST https://api.relaystation.ai/v1/data/groupby \
-H 'X-Payment: <base64 EIP-3009 auth>' \
-H 'Idempotency-Key: rollup-by-country-20260609' \
-H 'Content-Type: application/json' \
-d '{"file":{"inline":"Y291bnRyeSxzcGVuZApVUywxMApVUyw1CkRFLDcK"},"by":["country"],"aggregate":[{"fn":"count","as":"n"},{"column":"spend","fn":"sum","as":"total"}]}'
Errors
402 PAYMENT_REQUIRED— no valid payment.422 DATA_PARSE_FAILED— malformed input. Rejected before any charge.422 UNKNOWN_COLUMN— a referenced column isn’t in the data (the response lists the offenders).422 PREDICATE_TOO_DEEP— afilterwherenests past the depth cap.422 BAD_AGGREGATE— an unknown aggregate function, a missing/duplicate output name, or a reserved name.422 BAD_PATTERN— avalidatepatternisn’t a valid bounded regular expression.422 XLSX_TOO_LARGE— afrom-xlsxworkbook exceeds the cell-count cap (cputools.data.max_xlsx_cells).422 BAD_EXPRESSION— aderiveexpression uses an unknown op or references a missing column.413 OUTPUT_TOO_LARGE— ajoin/union/groupby/pivotresult exceeds the output-byte cap (the cartesian-blowup guard); the charge is refunded.
Next
Data — SQL · Pipeline · CSV tools · Pricing · API reference