Welcome to Gnosys Labs

Gnosys Labs is the autonomous ML platform that doesn't fool itself. Upload a dataset, the platform runs the propose→execute→validate→ promote loop, and you get back a calibrated ensemble plus an audit-ready model card. Every promoted spec has cleared six adversarial validators — none of them an LLM.

The shipped product today is tabular classification + regression. Agent-code and RL competitions (strategy domain) are in research preview — see What's coming.

What's special

LLMs already generate vast amounts of code. The hard part isn't generation — it's validation. Once a system both generates and evaluates, it starts fooling itself: finding patterns that don't exist, overfitting aggressively, mistaking selection bias for meaningful signal.

Gnosys is purpose-built around a validation layer that catches a generator deceiving its own evaluator. The six validators that run on every promoted spec:

Validator	What it catches
`honest_eval.shuffled_label`	Retrain with permuted `y_train`. A pipeline that keeps scoring is leaking.
`honest_eval.randomized_feature`	Retrain with permuted columns. Catches column-identity exploits.
`honest_eval.secondary_holdout`	Score on a cohort the strategist never saw.
`honest_eval.permutation_fwer`	Empirical p, BH-FDR corrected across the batch.
`dist_shift`	Train→holdout gap attributed to label / covariate / concept.
`multi_calibration`	Subgroup-conditional ECE, calibrated by MCGrad.

Read Honest evaluation for the dedicated permalink or The validation layer for the full theory.

Proof

Spaceship Titanic, via the LLM agent strategist with the full honest-eval suite: AUC 0.8955, validated by all six honest-eval checks. End-to-end on the platform, no human in the loop. See Tabular quickstart for the exact run config.

Head-to-head against AutoML on random-acts-of-pizza (same 75/25 split, same wall-clock budget): Gnosys 0.674 vs FLAML 0.605, AutoGluon 0.641, auto-sklearn 0.617. The OpenML benchmark reproduces this on your laptop.

Where to go next

Getting started — sign up, get an API key, install the SDK, run your first experiment in under five minutes.
Tabular quickstart — concrete worked example with the LLM agent strategist and all six validators firing.
Honest evaluation — the dedicated page for the six-validator suite and the model-card artefact.
Python SDK reference — every public class and method.
REST API reference — the full OpenAPI schema.
Plans + limits — pricing tiers and usage caps.
What's coming — strategy domain (agent-code, RL) preview.

Status

Public API surface lives at https://gnosyslabs.com/v1/*. The schema is at /v1/openapi.json. Real-time service status is at /v1/status.

Found a typo? Tell us.