Module · SpectraCheckLayers 22 → 39

Spectroscopy with an audit trail — from raw FID to regulatory-ready.

SpectraCheck is the spectroscopy intelligence engine inside MolTrace. NMR, LC-MS, HRMS, and MS/MS arrive as one evidence stack — typed, traceable, multi-modal by default. Every numerical claim links back to the spectrum it came from.

The pipeline, end-to-end

Six stages. Recipe-hash-linked. Audit-grade.

Every stage emits a typed Pydantic model with stable JSON keys. Every emission is pinned to the immutable raw archive by SHA-256 + recipe hash. You can replay any report from any prior date and get bit-identical output.

  1. 01

    Ingest

    Raw FID archive (Bruker / Agilent-Varian) lands in the immutable vault. SHA-256 hashed. Vendor metadata extracted: SFO1/BF1 (Bruker) or sfrq/reffrq (Varian).

    Outputs

    Archive · vendor · field_mhz · acquisition params

  2. 02

    Process

    Fourier transform + apodization + phasing + baseline correction. Every parameter recipe-hash-linked to the unchanged raw archive. Recipe is reproducible byte-for-byte.

    Outputs

    Processed spectrum (ppm + intensity) · processing_metadata

  3. 03

    Detect

    Legacy detector (default) or GSD-Prompt-3 (experimental opt-in). Both surface the same envelope: peaks, environments, category_counts, environment_counts.

    Outputs

    Peaks · environments · category mix · per-peak fit metrics

  4. 04

    Categorize

    Each peak auto-classified: compound / solvent / impurity / artifact / ¹³C satellite. Curated impurity-shift tables match labels (residual CHCl₃, methanol, acetic acid, BHT, …).

    Outputs

    Category · category_reason · impurity_match / solvent_hit

  5. 05

    Score

    39-layer evidence stack scores candidate SMILES across NMR, HRMS, MS/MS, predicted shifts, fragmentation trees, and reaction history. Cross-modal contradictions surface as warnings.

    Outputs

    Ranked candidates · DP4 confidence · multi-layer evidence

  6. 06

    Report

    Regulatory-ready structure-elucidation report composer. Every numerical claim is hyperlinked to its source. Human signoff required before release.

    Outputs

    Audit-grade report · ALCOA+ ledger entry

At a glance

  Raw FID  ──►  Process FID  ──►  Detect peaks  ──►  Categorise
     │              │                  │                   │
     │              │                  │                   ▼
     │              │                  │            Cross-modal score
     │              │                  │             (HRMS · MS/MS · 2D NMR)
     │              │                  │                   │
     ▼              ▼                  ▼                   ▼
  SHA-256       recipe_hash        per-peak QC      candidate ranking + DP4
  immutable     reproducible       (χ²ᵣ, RMSE,      with audit-grade trail
  vault         processing         FWHM, S/N)        and human signoff gate

Modalities supported

One evidence stack. Four modalities. No silos.

A pharmaceutical R&D group operates across NMR + LC-MS + HRMS + MS/MS simultaneously. SpectraCheck fuses these as one evidence stack — not separate apps — and uses cross-modal contradictions (HRMS exact mass disagreeing with NMR-implied formula) as first-class warnings.

1D NMR (¹H, ¹³C)

Layers 22 · 24

Accepts

Raw FID · JCAMP-DX · CSV · vendor exports

  • Bruker + Agilent-Varian FID parsing via nmrglue
  • Solvent-aware shift windows + residual-peak masking
  • Voigt / Lorentzian fitting with per-peak QC residuals
  • Multiplet clustering for environment counting

2D NMR (COSY · HSQC · HMQC · HMBC)

Layers 23 · 25

Accepts

Processed 2D NMR · vendor archives

  • Guarded behind ENABLE_2D_NMR feature flag
  • Cross-peak detection + connectivity assignment
  • Symmetrisation + denoise pipelines
  • Evidence consumed by candidate-scoring layers

HRMS · MS/MS

Layers 29 · 30 · 31 · 32

Accepts

mzML · mzXML · processed peak lists

  • HRMS exact-mass candidate + bounded formula search
  • MS1 adduct + isotope pattern inference
  • MS/MS fragmentation tree + diagnostic neutral losses
  • Processed MS/MS annotation (precursor, neutral loss)

LC-MS features

Layers 35 · 36 · 37

Accepts

mzML · mzXML · processed peak tables

  • Feature detection + EIC / XIC + peak purity
  • Feature grouping + blank subtraction + RT alignment
  • Cross-modal corroboration with NMR-implied formula
  • Contradictions surface as first-class warnings

Auto-classification

Every peak gets a category.

Peaks aren't just numbers. SpectraCheck applies a 5-category taxonomy — backed by curated impurity-shift tables and solvent residual-peak windows — so reviewers can scan a peak list and immediately see what's signal, what's noise, and what needs follow-up.

Categories are exposed as structured fields ( peak.category, peak.solvent_hit, peak.impurity_match) — not buried in a confidence number. Each impurity match cites the reference shift, the observed shift, and the delta.

  • Compound

    Target analyte signals — passed to candidate scoring.

  • Solvent

    Residual CHCl₃ at 7.26, DMSO-d₆ at 2.50, CD₃OD at 3.31. Auto-masked from candidate scoring.

  • Impurity

    Curated shift-table match — methanol, ethyl acetate, acetic acid, DMF, dichloromethane, BHT, …

  • Artifact

    Phasing distortions, sidebands, t₁ noise. Excluded from candidate scoring; flagged for reviewer.

  • ¹³C satellite

    Spinning sidebands of strong ¹H peaks at natural-abundance ¹³C positions. Down-weighted.

Two detectors, one envelope

Legacy by default. GSD by opt-in. Both behind the same gate.

We ship a stable, evidence-pipeline legacy detector as the default. The experimental GSD-Prompt-3 backend (Mestrenova-style peak detection with auto-classification) ships as opt-in until it clears a published validation gate. Both return the same { peaks, environments, category_counts } envelope so consumer code is detector-agnostic.

CapabilityLegacy (default)GSD (experimental)
Peak detection on NMRShiftDB2 corpusMedian Δ = 14Median Δ = 19 (Δ = 5 compound-only)
Solvent auto-detect (structured field)Inferred via impurity-match table94.4% on the 20-fixture corpus
Per-peak fit QC (χ²ᵣ, RMSE, FWHM, S/N, baseline σ)Now exposed (Phase 24, normalised pending)Native — normalised to baseline σ
Multiplicity + J-values + integrationNative — multiplet notation per peakNot exposed
Candidate structure matching · DP4 rankingFull evidence pipelinePeak detection only — no candidate matching
Category classification (5-set)Open string (detection + chemical-region)Closed enum, native
Promotion statusDefault backendExperimental · opt-in · gated

Full A/B methodology + per-fixture numbers published in Field notes and the technical white paper §3.1.

Measured, not claimed

The numbers we publish.

Every claim below is reproducible from the publicly-described corpus. The regression gate runs in CI; any drift larger than 50% on any single fixture fails the build with the fixture_id called out by name.

94.4%

Solvent auto-detect

NMRShiftDB2 20-fixture corpus. 17 of 18 fixtures with a reference. Strict gate target: 95%.

Δ ≤ 2

Compound-count median

Median absolute peak-count delta vs expert-curated references on the HMDB-style multiplet-line corpus. Strict gate cleared.

20

Fixture regression gate

Every detector change runs against a curated corpus before merge. CI fails by fixture_id when any single fixture drifts > 50%.

39

Evidence layers

Built additively Weeks 22 → 39. Every layer's output is a typed Pydantic model with stable JSON keys.

The closing loop

Spectroscopy isn't the end. It's the start of a loop.

Most analytical platforms stop at "the spectrum has been processed." Ours doesn't. A real worked example with acetic acid impurity:

  1. Spectroscopy detects

    Impurity peak at 2.10 ppm matched to acetic acid CH₃ within 0.001 ppm. Confidence 93%.

  2. Regulatory routes

    Acetic acid is ICH Q3C Class 3 — no need for action below 5000 ppm. Action item raised only if observed concentration crosses threshold.

  3. Reaction optimization constrains

    Next experiment's reaction recipe receives the impurity limit as a Bayesian prior. Solvent + workup updated automatically.

  4. Loop closes

    Re-acquired spectrum confirms impurity below threshold. Audit ledger records every step from FID hash to recipe update.

Audit & compliance

Designed against the regulations you'll be audited on.

SpectraCheck isn't compliant by accident. The audit ledger, the immutable raw vault, the recipe-hash provenance, and the human-signoff release gate were designed against ICH Q2(R2) ALCOA+, the FDA's January 2025 AI framework, and the EMA reflection paper from day one.

  • Immutable raw vault

    Every FID is SHA-256 hashed, vault path policy enforced, and never overwritten.

  • Recipe-hash provenance

    Every processing run links a recipe hash to the unchanged raw archive. Bit-identical replay forever.

  • Human signoff queue

    No regulatory document is released without an explicit qualified-human attribution.

  • ALCOA+ audit ledger

    Attributable · Legible · Contemporaneous · Original · Accurate · Complete · Consistent · Enduring · Available.

  • Cross-modal contradiction warnings

    HRMS exact mass disagreeing with NMR-implied formula raises a first-class warning before signoff.

  • Tenant isolation by default

    SOC 2 Type II controls, GDPR-compliant data residency, role-scoped audit-event ledger.

See it on your own spectra.

Open SpectraCheck on the platform or schedule a 30-minute walkthrough on a real analyte from your workflow.