Getting started
Five minutes from signup to first run. No ML expertise required for this walkthrough — the platform handles the loop; you bring a template + a budget.
1. Sign up
Go to gnosyslabs.com/signup, enter
your email + organization + password. You'll land on a one-time page
that displays your first API key. Copy it now — for security we
never display it again. Lost keys can't be recovered, only revoked
and re-issued.
The free tier gives you 5 runs per month with the hyperparameter- sweep strategist. All six honest-eval validators, MCGrad calibration, the model card endpoint, and the predict endpoint are included on every plan — even free. LLM-driven runs (where an LLM proposes pipelines each round) require a Starter plan or above because they consume LLM tokens. See Plans + limits.
2. Install the SDK
pip install gnosyslabs
Requires Python 3.10 or newer. The package is a thin client over the
public REST API — it doesn't install the engine, so it stays under
1 MB of dependencies (httpx, pydantic, that's it).
Install name vs import name. You
pip install gnosyslabsbutimport gnosys— the same split aspip install scikit-learn→import sklearn. If you previously ranpip install gnosys(an unrelated PyPI package), runpip uninstall gnosysfirst.
Set your API key as an environment variable so you don't have to pass it explicitly:
export GNOSYS_API_KEY=gn_live_...
3. Submit your first run
from gnosys import GnosysClient
client = GnosysClient() # picks up $GNOSYS_API_KEY
# Submit a hyperparameter sweep on a built-in synthetic binary
# classification dataset. Round 0 tries 5 coarse points; rounds 1+
# refine around the best. The validation layer runs honest-eval +
# multi-cal + dist-shift on every spec.
run = client.runs.create(
domain="tabular",
strategist={
"kind": "hp_sweep",
"key": "C",
"values": [0.001, 0.01, 0.1, 1.0, 10.0],
},
spec_template={
"spec_id": "_t",
"name": "first run",
"hypothesis": "regularisation strength on synthetic binary",
"task": "classification",
"dataset_id": "synthetic_binary",
"model_family": "logistic",
"hyperparameters": {"C": 1.0},
},
max_iterations=4,
no_progress_window=2,
)
print(f"submitted: {run.run_id}")
runs.create returns a 202 immediately; the platform launches the
loop in the background. Poll for completion:
run = client.runs.wait(run.run_id, timeout=600.0)
print(f"final status: {run.status}")
4. Read the findings
Every spec produces a set of validation findings. The most useful filter is "show me everything that BLOCKED a spec":
blockers = client.findings.list(run_id=run.run_id, severity="blocker")
for f in blockers:
print(f"{f.validator:<32} {f.detail[:120]}")
Typical first-run output (the over-regularised C=0.001 model hits the multi-calibration BLOCKER because it can't extract enough signal to make calibrated predictions):
multi_calibration ece=0.1947 mce=0.2741 worst_slice=None ...
findings.list accepts every other filter you'd expect:
client.findings.list(run_id=run.run_id, validator="honest_eval.shuffled_label")
client.findings.list(spec_id="hp-0-C=1", limit=200)
client.findings.list(run_id=run.run_id, severity="blocker")
5. Inspect from the dashboard
Every run is queryable from
gnosyslabs.com/dashboard too.
The dashboard shows per-iteration validated/rejected counts, the
best record, and the full validation breakdown — same data,
browser-friendly form.
What just happened
In four iterations the platform:
- Proposed — the HP sweep strategist emitted 5 coarse logistic- regression specs (round 0), then 2-3 refined neighbours each subsequent round around the best AUC.
- Executed — every spec trained on a stratified 60/20/20 train/test/secondary split.
- Validated — all six validators (multi-calibration, distribution-shift, shuffled-label, randomized-feature, secondary-holdout, permutation-FWER) ran on every result. The canonical deception test (shuffled-label) re-trained each spec under permuted training labels; specs that retained performance there were rejected as leaks.
- Promoted — passing specs were promoted based on
auc + ecethresholds (strict and relaxed gates), with the ensemble multi-calibrated by MCGrad on the kept survivors.
Read Honest evaluation for the per-validator summary or The validation layer for the full theory behind each check.
Next steps
- Read Tabular quickstart for a more elaborate worked example with the LLM strategist.
- Past synthetic data — bring your own classification CSV: see Bring your own data for the full upload → run → predict flow.
- Compare us against AutoML baselines: OpenML benchmark — Gnosys vs FLAML / AutoGluon ships an example script that does the comparison on your laptop.
- Browse the Python SDK reference for every method.
- Set up the
stigmera diagnose findingsquery patterns to find correlated blockers across runs.
Common errors
AuthenticationError: invalid or revoked API key — your key was
revoked, or you copy-pasted only part of it. Issue a fresh one at
/dashboard/api-keys.
HTTPException 402: monthly run quota exhausted — you hit the
free-tier 5-run cap. Upgrade at
gnosyslabs.com/pricing.
HTTPException 429: rate limit — you sent too many run-create
requests too quickly. The header includes Retry-After; back off
that long and retry.
Found a typo? Tell us.