About MolTrace

Drug discovery deserves AI built like a peer reviewer.

MolTrace Technologies, Inc. is a venture-backed scientific intelligence company building the audit-ready evidence engine for pharmaceutical R&D. Every numerical claim we surface — peak, score, candidate, compliance verdict — is reachable, reproducible, and human-signed-off.

Why we exist

The cognitive overhead is the cost.

Routine structure elucidation in industry still consumes 6–48 hours per non-trivial small molecule, even with experienced analysts and modern NMR. 70%+ of that is peak picking, integration adjustment, candidate ranking, and assembling the result into a reviewable narrative. The acquisition takes minutes; the rest is friction.

Independent reproductions of published NMR-derived structures fail at 10–30%, largely because the chain of custody from FID to regulatory submission is invisible by default — phase corrections done by eye, integration regions adjusted post-hoc, peak lists re-edited in spreadsheets.

Meanwhile the FDA's January 2025 AI framework and the EMA's reflection paper put new credibility burdens on every AI-derived claim in a regulatory submission. R&D groups are being asked to simultaneously: adopt more AI, prove it's reproducible, document every parameter that drove an assignment, and keep the human chain of decisions visible to inspectors.

The status-quo toolchain — spreadsheets, ad-hoc desktop processing, email-attached PDFs — doesn't satisfy any of those four constraints. MolTrace does.

The four commitments

What we believe about AI in regulated science.

Every architectural decision in the platform derives from one of these four. If a feature can't be justified against them, we don't ship it.

01

Evidence-first

Every claim shown in the UI is reachable, by hyperlink, to its underlying data: the source spectrum file, the picked peaks, the SMILES candidate, the literature citation that justifies the chemical-shift window, and the human reviewer who released the final report. There is no confidence number with no audit trail anywhere in the system.

02

Human-in-the-loop, never autonomous

No regulatory document is released without an explicit human signoff. AI accelerates evidence assembly; humans make the call. This is consistent with both the FDA AI credibility framework (Stage 4 — human oversight gates) and the EMA reflection paper on AI in the medicinal-product lifecycle.

03

Open science under the hood

Where a community-maintained, peer-reviewed library exists, we use it: RDKit for cheminformatics, nmrglue for vendor FID parsing, mzML for MS interoperability, Pydantic for typed contracts, FastAPI for routing, Next.js for the UI. Proprietary code is confined to the evidence-orchestration and confidence-aggregation layers, where the additive value lives.

04

Multi-modal by default

A pharmaceutical R&D group operates across NMR + LC-MS + HRMS + MS/MS + reaction history simultaneously. MolTrace fuses these as one evidence stack — not as separate apps — and uses cross-modal contradictions (e.g. HRMS exact mass disagreeing with NMR-implied formula) as first-class warnings.

The numbers we publish

We'd rather show you the work.

Marketing pages everywhere claim 95%+ accuracy. We publish the corpus, the gate, and the regression test that catches us when we drift.

39

Evidence layers

Each layer is additive, typed, and never overwrites a prior layer's contract. Built incrementally Weeks 22 → 39.

94.4%

Solvent auto-detect

NMRShiftDB2 20-fixture validation corpus. Strict promotion gate target is 95%; framework continues to validate the algorithm on HMDB-style references.

20

Fixture regression gate

FE-produced A/B JSON is wired into a backend CI test. Any detector drift > 50% on any fixture fails CI by fixture_id.

6–48 hrs

Today's bottleneck

Routine 1D NMR structure elucidation per non-trivial small molecule. 70%+ of that is cognitive overhead — peak picking, integration, candidate ranking, report-writing.

What we won't ship

The things we say no to are as important as the things we say yes to.

  • No default-on AI without a strict gate. New detection backends ship as experimental, opt-in. They only become the default when they clear a published statistical promotion gate, measured against expert-curated references.
  • No confidence number without an audit trail. If we can't tell you why a score is 0.87 — which layer, which reference, which reviewer signed off — we won't show the number.
  • No autonomous regulatory release. Every dossier, every report, every signoff requires a qualified human in the loop. AI triages; humans decide. Liability stays where regulators expect it.
  • No raw-data overwrites. The immutable FID vault is the first storage layer. Every processing run is recipe-hash-linked to the unchanged raw archive — forever.

The product loop

Three pillars, one closed evidence loop.

Most analytical platforms stop at "the spectrum has been processed." Ours doesn't. Spectroscopy evidence flows directly into regulatory action items, which become constraints on the next round of reaction optimization.

The closing loop

  Raw FID  ─►  Processed spectrum  ─►  Peaks + categories  ─►  Multi-modal evidence
                                                                            │
                                                                            ▼
        ┌─►  Next experiment (ReactionIQ)  ◄─  Regulatory action items  ◄─┘
        │                                            (Regulatory Hub)
        │
        └─  with impurity / solvent / nitrosamine constraints fed back as priors

Under the hood

We stand on peer-reviewed shoulders.

Where a community-maintained library exists, we use it. Proprietary code is confined to the evidence-orchestration and confidence-aggregation layers — that's where the additive value lives.

This isn't a cost-savings choice. It's a regulatory one: open dependencies are inspectable, reproducible, and survive vendor turnover.

RDKit

Cheminformatics — SMILES canonicalisation, descriptors, substructure matching

nmrglue

Vendor-agnostic FID parsing (Bruker / Agilent-Varian)

mzML / mzXML

Open mass-spectrometry data interchange

lmfit

Voigt / Lorentzian fitting with per-peak QC residuals

Pydantic

Typed API contracts — the FE↔BE binding contract

FastAPI

Backend routing layer (Python 3.13)

Next.js 16 / React 19

Application UI (Vercel deployment)

Plotly

Spectrum rendering with static-plot anti-shake

Recent ships

A weekly cadence, in public.

Every existing endpoint and regression test stays green as new evidence layers land. These are the most recent capabilities to clear the gate.

  1. May 28, 2026

    Per-peak QC fit metrics on legacy peaks

    Reduced χ², RMSE, FWHM, S/N, and baseline σ now exposed per peak on the same regulatory-tier surface GSD already provides.

  2. May 27, 2026

    HMDB-style validation framework

    Multiplet-line-granularity references that match how peak-pickers actually count. Strict gate cleared on this corpus.

  3. May 27, 2026

    20–35% backend perf wins on dense ¹³C

    98,304-point FIDs that took 5.5 minutes now finish in 3.6.

  4. May 27, 2026

    Legacy / GSD response parity envelope

    Both detectors now expose { peaks, environments, environment_count, category_counts }. The FE renders both through one detector-agnostic panel.

  5. May 27, 2026

    Multiplet clustering at the detection layer

    Resolves the corpus-vs-detector granularity mismatch documented in §3.1 of the technical paper.

Where we work

Co-located with the science.

Three hubs, each picked for proximity to the pharma R&D + regulatory community in its region.

Boston, MA

Headquarters · Americas

Proximate to the Cambridge / Kendall Square pharma corridor and the FDA Boston District office.

London, UK

EMEA · Regulatory liaison

Co-located with the MHRA's UK regulatory ecosystem and the broader EMA reflection-paper community.

Singapore

APAC

Adjacent to the APAC pharma-manufacturing corridor and the regional CRO base.

Compliance, in plain language

What each badge actually means for you.

SOC 2 Type II

Independent attestation of security, availability, and confidentiality controls.

GDPR-ready

Tenant data residency + processing notices aligned to the EU framework.

ICH Q2(R2) aligned

Audit ledger + immutable raw vault + recipe-hash provenance map onto the ALCOA+ data-integrity principles.

GxP-validation ready

Releases gated by a human signoff queue with reviewer attribution per artefact.

FDA AI framework (Jan 2025)

Risk-based credibility framework with explicit traceability, model documentation, and human oversight.

EMA reflection paper

AI-derived evidence in submissions is reproducible, version-controlled, and subordinate to expert review.

Want to dig deeper?

Read the methodology, browse the modules, or talk to a human about how MolTrace would fit your evidence chain.