Command Palette

Search for a command to run...

Knowledge Library
ML / AI dataset candidates

MolTrace · Knowledge · Dataset Candidates

Dataset candidate dashboard

Governance-focused listing: identifiers and review metadata only. Do not treat aggregates as validation of underlying chemistry or confidential content.

Review before ML use

Dataset candidates reference reviewed records; approval workflows and leakage checks must complete before training or benchmarking.

Training

1. Training dataset candidates

Curated knowledge claims nominated for ML model training — identifiers, record type, review metadata, and curation status.

Loading…

Nominate training candidate
Nominate a knowledge claim as a training candidate by specifying the record type, record ID, dataset type, source, citation IDs, and quality flags.
Benchmark

2. Benchmark dataset candidates

Knowledge claims nominated for ML benchmark evaluation — includes leakage risk label and split recommendation. Citation IDs are not modeled on benchmark candidates and display as blank.

Loading…

Nominate benchmark candidate
Nominate a knowledge claim as a benchmark evaluation candidate by specifying the record type, record ID, benchmark type, and leakage risk classification.
Versions

3. Dataset versions

Versioned snapshots of training and benchmark splits — each version locks candidate IDs into train, validation, test, and holdout partitions for reproducible model training.

POST dataset version
Populate split_json with keys train, validation, test, holdout (comma-separated candidate IDs per field). source_record_ids_json is the deduplicated union of split IDs.

Loading…

Leakage

4. Leakage risk warnings

Aggregated from benchmark leakage_risk_label and dataset version leakage_warnings_json (summaries only).

leakage_risk_label · low

0

benchmark rows

leakage_risk_label · medium

0

benchmark rows

leakage_risk_label · high

0

benchmark rows

leakage_risk_label · unknown

0

benchmark rows

No leakage_warnings_json entries on loaded dataset versions.

Quality

5. Quality flags

Counts from training and benchmark quality_flags_json (flag strings only).

No quality flags on loaded candidates.