Getting started

Five minutes from signup to first run. No ML expertise required for this walkthrough — the platform handles the loop; you bring a template + a budget.

1. Sign up

Go to gnosyslabs.com/signup, enter your email + organization + password. You'll land on a one-time page that displays your first API key. Copy it now — for security we never display it again. Lost keys can't be recovered, only revoked and re-issued.

The free tier gives you 5 runs per month with the hyperparameter- sweep strategist. All six honest-eval validators, MCGrad calibration, the model card endpoint, and the predict endpoint are included on every plan — even free. LLM-driven runs (where an LLM proposes pipelines each round) require a Starter plan or above because they consume LLM tokens. See Plans + limits.

2. Install the SDK

pip install gnosyslabs

Requires Python 3.10 or newer. The package is a thin client over the public REST API — it doesn't install the engine, so it stays under 1 MB of dependencies (httpx, pydantic, that's it).

Install name vs import name. You pip install gnosyslabs but import gnosys — the same split as pip install scikit-learn → import sklearn. If you previously ran pip install gnosys (an unrelated PyPI package), run pip uninstall gnosys first.

Set your API key as an environment variable so you don't have to pass it explicitly:

export GNOSYS_API_KEY=gn_live_...

3. Submit your first run

from gnosys import GnosysClient

client = GnosysClient()  # picks up $GNOSYS_API_KEY

# Submit a hyperparameter sweep on a built-in synthetic binary
# classification dataset. Round 0 tries 5 coarse points; rounds 1+
# refine around the best. The validation layer runs honest-eval +
# multi-cal + dist-shift on every spec.
run = client.runs.create(
    domain="tabular",
    strategist={
        "kind": "hp_sweep",
        "key": "C",
        "values": [0.001, 0.01, 0.1, 1.0, 10.0],
    },
    spec_template={
        "spec_id": "_t",
        "name": "first run",
        "hypothesis": "regularisation strength on synthetic binary",
        "task": "classification",
        "dataset_id": "synthetic_binary",
        "model_family": "logistic",
        "hyperparameters": {"C": 1.0},
    },
    max_iterations=4,
    no_progress_window=2,
)
print(f"submitted: {run.run_id}")

runs.create returns a 202 immediately; the platform launches the loop in the background. Poll for completion:

run = client.runs.wait(run.run_id, timeout=600.0)
print(f"final status: {run.status}")

4. Read the findings

Every spec produces a set of validation findings. The most useful filter is "show me everything that BLOCKED a spec":

blockers = client.findings.list(run_id=run.run_id, severity="blocker")
for f in blockers:
    print(f"{f.validator:<32} {f.detail[:120]}")

Typical first-run output (the over-regularised C=0.001 model hits the multi-calibration BLOCKER because it can't extract enough signal to make calibrated predictions):

multi_calibration                ece=0.1947 mce=0.2741 worst_slice=None ...

findings.list accepts every other filter you'd expect:

client.findings.list(run_id=run.run_id, validator="honest_eval.shuffled_label")
client.findings.list(spec_id="hp-0-C=1", limit=200)
client.findings.list(run_id=run.run_id, severity="blocker")

5. Inspect from the dashboard

Every run is queryable from gnosyslabs.com/dashboard too. The dashboard shows per-iteration validated/rejected counts, the best record, and the full validation breakdown — same data, browser-friendly form.

What just happened

In four iterations the platform:

Proposed — the HP sweep strategist emitted 5 coarse logistic- regression specs (round 0), then 2-3 refined neighbours each subsequent round around the best AUC.
Executed — every spec trained on a stratified 60/20/20 train/test/secondary split.
Validated — all six validators (multi-calibration, distribution-shift, shuffled-label, randomized-feature, secondary-holdout, permutation-FWER) ran on every result. The canonical deception test (shuffled-label) re-trained each spec under permuted training labels; specs that retained performance there were rejected as leaks.
Promoted — passing specs were promoted based on auc + ece thresholds (strict and relaxed gates), with the ensemble multi-calibrated by MCGrad on the kept survivors.

Read Honest evaluation for the per-validator summary or The validation layer for the full theory behind each check.

Next steps

Read Tabular quickstart for a more elaborate worked example with the LLM strategist.
Past synthetic data — bring your own classification CSV: see Bring your own data for the full upload → run → predict flow.
Compare us against AutoML baselines: OpenML benchmark — Gnosys vs FLAML / AutoGluon ships an example script that does the comparison on your laptop.
Browse the Python SDK reference for every method.
Set up the stigmera diagnose findings query patterns to find correlated blockers across runs.

Common errors

AuthenticationError: invalid or revoked API key — your key was revoked, or you copy-pasted only part of it. Issue a fresh one at /dashboard/api-keys.

HTTPException 402: monthly run quota exhausted — you hit the free-tier 5-run cap. Upgrade at gnosyslabs.com/pricing.

HTTPException 429: rate limit — you sent too many run-create requests too quickly. The header includes Retry-After; back off that long and retry.

Found a typo? Tell us.