Welcome to Gnosys Labs

Gnosys Labs is the autonomous ML platform that doesn't fool itself. Upload a dataset, the platform runs the propose→execute→validate→ promote loop, and you get back a calibrated ensemble plus an audit-ready model card. Every promoted spec has cleared six adversarial validators — none of them an LLM.

The shipped product today is tabular classification + regression. Agent-code and RL competitions (strategy domain) are in research preview — see What's coming.

What's special

LLMs already generate vast amounts of code. The hard part isn't generation — it's validation. Once a system both generates and evaluates, it starts fooling itself: finding patterns that don't exist, overfitting aggressively, mistaking selection bias for meaningful signal.

Gnosys is purpose-built around a validation layer that catches a generator deceiving its own evaluator. The six validators that run on every promoted spec:

Validator What it catches
honest_eval.shuffled_label Retrain with permuted y_train. A pipeline that keeps scoring is leaking.
honest_eval.randomized_feature Retrain with permuted columns. Catches column-identity exploits.
honest_eval.secondary_holdout Score on a cohort the strategist never saw.
honest_eval.permutation_fwer Empirical p, BH-FDR corrected across the batch.
dist_shift Train→holdout gap attributed to label / covariate / concept.
multi_calibration Subgroup-conditional ECE, calibrated by MCGrad.

Read Honest evaluation for the dedicated permalink or The validation layer for the full theory.

Proof

Spaceship Titanic, via the LLM agent strategist with the full honest-eval suite: AUC 0.8955, validated by all six honest-eval checks. End-to-end on the platform, no human in the loop. See Tabular quickstart for the exact run config.

Head-to-head against AutoML on random-acts-of-pizza (same 75/25 split, same wall-clock budget): Gnosys 0.674 vs FLAML 0.605, AutoGluon 0.641, auto-sklearn 0.617. The OpenML benchmark reproduces this on your laptop.

Where to go next

Status

Public API surface lives at https://gnosyslabs.com/v1/*. The schema is at /v1/openapi.json. Real-time service status is at /v1/status.


Found a typo? Tell us.