MolTrace · Knowledge · Dataset Candidates
Dataset candidate dashboard
Governance-focused listing: identifiers and review metadata only. Do not treat aggregates as validation of underlying chemistry or confidential content.
Review before ML use
Dataset candidates reference reviewed records; approval workflows and leakage checks must complete before training or benchmarking.
1. Training dataset candidates
Curated knowledge claims nominated for ML model training — identifiers, record type, review metadata, and curation status.
Loading…
2. Benchmark dataset candidates
Knowledge claims nominated for ML benchmark evaluation — includes leakage risk label and split recommendation. Citation IDs are not modeled on benchmark candidates and display as blank.
Loading…
3. Dataset versions
Versioned snapshots of training and benchmark splits — each version locks candidate IDs into train, validation, test, and holdout partitions for reproducible model training.
split_json with keys train, validation, test, holdout (comma-separated candidate IDs per field). source_record_ids_json is the deduplicated union of split IDs.Loading…
4. Leakage risk warnings
Aggregated from benchmark leakage_risk_label and dataset version leakage_warnings_json (summaries only).
leakage_risk_label · low
benchmark rows
leakage_risk_label · medium
benchmark rows
leakage_risk_label · high
benchmark rows
leakage_risk_label · unknown
benchmark rows
No leakage_warnings_json entries on loaded dataset versions.
5. Quality flags
Counts from training and benchmark quality_flags_json (flag strings only).
No quality flags on loaded candidates.