Welcome to Gnosys Labs
Gnosys Labs is the autonomous ML platform that doesn't fool itself. Upload a dataset, the platform runs the propose→execute→validate→ promote loop, and you get back a calibrated ensemble plus an audit-ready model card. Every promoted spec has cleared six adversarial validators — none of them an LLM.
The shipped product today is tabular classification + regression. Agent-code and RL competitions (strategy domain) are in research preview — see What's coming.
What's special
LLMs already generate vast amounts of code. The hard part isn't generation — it's validation. Once a system both generates and evaluates, it starts fooling itself: finding patterns that don't exist, overfitting aggressively, mistaking selection bias for meaningful signal.
Gnosys is purpose-built around a validation layer that catches a generator deceiving its own evaluator. The six validators that run on every promoted spec:
| Validator | What it catches |
|---|---|
honest_eval.shuffled_label |
Retrain with permuted y_train. A pipeline that keeps scoring is leaking. |
honest_eval.randomized_feature |
Retrain with permuted columns. Catches column-identity exploits. |
honest_eval.secondary_holdout |
Score on a cohort the strategist never saw. |
honest_eval.permutation_fwer |
Empirical p, BH-FDR corrected across the batch. |
dist_shift |
Train→holdout gap attributed to label / covariate / concept. |
multi_calibration |
Subgroup-conditional ECE, calibrated by MCGrad. |
Read Honest evaluation for the dedicated permalink or The validation layer for the full theory.
Proof
Spaceship Titanic, via the LLM agent strategist with the full honest-eval suite: AUC 0.8955, validated by all six honest-eval checks. End-to-end on the platform, no human in the loop. See Tabular quickstart for the exact run config.
Head-to-head against AutoML on random-acts-of-pizza (same 75/25 split, same wall-clock budget): Gnosys 0.674 vs FLAML 0.605, AutoGluon 0.641, auto-sklearn 0.617. The OpenML benchmark reproduces this on your laptop.
Where to go next
- Getting started — sign up, get an API key, install the SDK, run your first experiment in under five minutes.
- Tabular quickstart — concrete worked example with the LLM agent strategist and all six validators firing.
- Honest evaluation — the dedicated page for the six-validator suite and the model-card artefact.
- Python SDK reference — every public class and method.
- REST API reference — the full OpenAPI schema.
- Plans + limits — pricing tiers and usage caps.
- What's coming — strategy domain (agent-code, RL) preview.
Status
Public API surface lives at https://gnosyslabs.com/v1/*. The
schema is at /v1/openapi.json.
Real-time service status is at
/v1/status.
Found a typo? Tell us.