Methodology · v1.0

How the audit works.

Taktly’s audit is a hybrid of deterministic pattern detection and a calibrated language model for section scoring and voice. The deterministic layer is where hard refusals live. The language model is where section nuance lives. The deterministic layer wins every conflict.

Deterministic detectors

Pattern-matched. No LLM judgment. Cannot be softened.

Catches the patterns senior reviewers catch in their first read. These detectors fire on regex + linguistic patterns encoded from senior-practitioner experience. When a hard refusal pattern is present, the audit reports it — the LLM cannot down-rank or paraphrase it away.

operator-error-as-terminal-cause
training-only CAPA in regulated environments
“continue to monitor” effectiveness checks
definitive-cause-without-evidence
correlation-as-causation
5-Whys looping
unbounded problem scope
no-measurable-impact problem statements
missing impact assessment in Tier 3+ investigations
missing prior-occurrence search in recurring events

Language-model section scoring

Calibrated industry-tuned persona, bounded by the deterministic layer.

Each section is scored 0–100 against explicit anchors. The persona is industry-specific (fill-finish, vaccines, medical device, food, lab, etc.) so the voice sounds like someone who has personally worked the user’s domain. The proposed overall score is bounded by a hard cap derived from the deterministic detector count and the FDA-pattern likelihood — section scoring cannot inflate the score past those bounds.

FDA Warning Letter pattern library

Hand-curated from real public FDA actions.

Each pattern in the library is paraphrased (not verbatim) from a real FDA Warning Letter or 483 and tied to a specific public action. When a user’s document matches a pattern, Taktly surfaces the match with the company, year, and citation. The library currently contains 6 patterns and grows weekly. Target: 50 patterns by Q3 2026.

Execution engine

What runs inside every Taktly project.

Once the audit hands off to a project, these are the layers a Master Black Belt would build for themselves. Each one is graded against a named regulation or a published statistical standard. None of them are sprinkled on top — they all read from one Project Context Graph.

Capability math (Measure)

AIAG SPC manual · Six Sigma 1.5σ shift convention

Cp, Cpk, Pp, Ppk, long-term σ-level, DPMO, defect rate, normality flags. Computed on every uploaded measurement dataset. The σ-level uses the standard 1.5σ shift; Cpk verdict bands follow AIAG (< 1.0 not capable, 1.0-1.33 marginal, 1.33-2.0 capable, ≥ 2.0 world-class).

Statistical power & sample size (Improve + Control)

Cohen 1988 · z-test approximations

One-sample mean, two-sample mean, and two-proportion sample-size calculations at 80% / 90% / 95% power, one- or two-sided. Verification windows surface a verdict — 'observed n vs. required n at chosen power' — and flag undersized windows with a recommended extension.

FMEA / RPN on every action (Improve)

AIAG-VDA FMEA · ISO 9001 / 21 CFR 820 / IATF 16949 / AS9100

Severity × Occurrence × Detection scored 1-10 per axis with full guidance text. RPN bands: ≤ 50 low, 50-119 medium, 120-199 high (mitigation textarea required), ≥ 200 critical (pilot mandatory). Containment / corrective / preventive separation enforced per the three-bucket QMS standard.

Devil's Advocate review (Analyze)

Popperian falsifiability · adversarial AI pattern

On-demand Red-Team pass on every confirmed root cause. Returns 2-4 specific objections (severity: fatal / substantive / minor) plus the single piece of evidence that would settle each. Pairs with a required falsifiability prompt — 'what evidence would change your mind?' — captured at the moment of marking rca_confirmed = yes.

Financial savings model (Define + closeout)

Defect cost-of-quality methodology

Baseline rate × target rate × annual volume × cost per event = annualized hard savings with low / point / high sensitivity bands (conservative half-capture default). Optional soft savings via hours × hourly rate. Server-recomputed cache so dashboards stay consistent across the product.

Taktly Watch (post-closure)

Linear-regression drift detection · process-stability heuristics

Reads post-closure verification samples on a cadence and returns holding / drifting / regressed with retention % and trend slope. The Sustainment dashboard aggregates verdicts across every closed project so a CI Director sees the full portfolio at a glance.

Project Context Graph + coherence

Pure-function rule engine · 17 deterministic checks

One normalized memory layer every AI route and check reads from. Coherence checkers fire on every save: problem-statement funnel discipline (v3 ≠ v2, v4 ≠ v3), confirmed-cause-without-evidence, human-error in root cause, orphan actions, missing preventive, undersized effectiveness windows, synthesis staleness. Critical signals block tollgate advance the same way the rigor floor does.

Calibration anchors

How sections are scored.

Section scoring uses these published anchors. Most real CI work scores 40–70 honestly. A 90+ is rare; a 0–19 indicates a hard refusal or a missing section.

Band	What it means
90–100	Specific numbers, cited evidence, zero hedging. Genuinely excellent. Rare.
75–89	Well-defined and specific. Minor hedges or one missing supporting element.
60–74	Structurally complete but key quantification is missing or hedged.
40–59	Right shape but content is generic or unmeasurable. A senior reviewer will push back.
20–39	Vague enough that a reviewer cannot tell what the team is committing to.
0–19	Missing, fabricated, or contains a hard refusal.

Overall score caps

Why a document with 7 hard refusals cannot score 58.

Overall score is bounded by four ceilings. The lowest wins:

The LLM’s proposed overall score
The mean of section scores
The hard-refusal cap (5+ refusals → max 35; 7+ → max 25)
The FDA 483-likelihood cap (≥ 80% → max 30; ≥ 60% → max 45)

Change tracking & validation

Public changelog. Per-flag feedback. Published accuracy.

Changelog. Every detector rule and FDA pattern update is logged dated at gettaktly.com/changelog.
Per-flag feedback. Every flag in the report carries a thumbs-up / thumbs-down so users can tell us which catches were real and which were wrong.
Accuracy data. We publish aggregate per-detector accuracy when we have ≥ 500 validated flags per detector type. Target: first numbers published Q4 2026.

Need this in a single PDF for your Quality Council?

Run an audit · then download the Quality Council Approval Kit