What's coming

The tabular domain is the shipped product. This page covers what's next — research-stage capability customers can opt into early.

Status: preview / research. The APIs below work end-to-end in the local SDK, but the SaaS surface is gated to design partners while we shake out throughput, cost, and abuse vectors. Contact sales for early access.

Strategy domain — agent-code competitions

The same propose→execute→validate→promote loop, applied to multi-agent games. First target: Kaggle Orbit Wars.

Two execution modes share the run-create surface (POST /v1/runs {"domain": "strategy", "mode": ...}):

mode="code" — LLM-written agent

The strategist proposes Python agent classes. The executor runs each one in a sandboxed kaggle-environments tournament against a league of opponents. Scoring is winrate; honest-eval runs over the tournament outcomes the same way it runs over tabular AUC.

Typical eval: ~10–20 s per spec. A 5-iteration run with 4 specs/round finishes in a couple of minutes.

mode="rl_train" — PPO + behaviour cloning

The strategist proposes RL training configs (reward shaping, behaviour-cloning warm-start labels, curriculum schedules). The executor trains a two-head policy (target selector + garrison fraction) via PPO, with the BC warm-start coming from the platform's library of past mode="code" agents.

Typical training: ~2–5 minutes per spec. Architecture choice (MLP or attention) is part of the spec.

Submission packaging

Same endpoint as tabular: POST /v1/runs/{id}/submission returns either a single main.py (for mode="code") or a tarball with policy.npz + a thin main.py shim (for mode="rl_train"). Designed to drop directly into a Kaggle submission slot.

Customisation hooks

All three RL hooks are sandboxed Python (no imports, no I/O):

  • Reward shaping — modify the per-step reward before PPO consumes it.
  • BC label override — replace the warm-start label for a state without retraining the agent that produced it.
  • Curriculum scheduling — gate which opponents the policy trains against at each PPO epoch.

The same honest-eval suite runs over tournament outcomes regardless of mode — a code-mode agent and an RL-mode policy get scored by the same six validators against the same opponent league.

What's not in scope (yet)

  • Vision — image classification, segmentation, generative tasks. Tabular and strategy domains are the focus through 2026.
  • NLP — text classification, token-level tasks, RAG. Same reason.
  • Multi-table joins — we accept a single CSV / parquet per run; multi-table reasoning is a planned tabular extension.
  • Time-series forecasting as a first-class domain — you can do it today by featurizing into a tabular spec, but the orchestrator doesn't yet know about temporal cross-validation as a primitive.

Get early access

Email sales with: which domain you want to try, the dataset shape, and the validation surface your team currently relies on. We're prioritising partners where the validation record itself is part of the value — same ICP as the tabular product.


Found a typo? Tell us.