Medical Device AI: FDA SaMD Compliance with Quality Scoring
Software-as-a-medical-device teams need more than a general safety banner around AI. They need a way to reject weak responses, prevent the route from drifting into unsupported treatment language, and stop release-candidate outputs from reaching users without review. Keeptrusts gives you those pieces through quality-scorer, healthcare-compliance, human-oversight, rbac, pii-detector, and audit-logger. The result is not automatic FDA compliance, but it is a real governance boundary that makes SaMD-adjacent AI workflows more measurable and easier to defend.
Use this page when
- You are using AI in complaint handling, labeling support, clinical evaluation drafting, or regulated device documentation workflows.
- You need a quality gate before AI output is accepted into a high-consequence process.
- You want a route design that stays aligned with current Keeptrusts policy behavior.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, Quality and regulatory reviewers
The problem
Medical device teams often adopt AI first in workflows that feel indirect: CAPA summaries, complaint intake triage, technical file drafting, post-market surveillance notes, or internal review assistance. But those are exactly the workflows where bad output becomes institutional quickly. A weak AI summary can influence which complaint gets escalated. An overconfident draft can sneak into design history or labeling discussions. A polished answer can look more trustworthy than it is.
That is why SaMD-adjacent AI governance needs a real output gate. Data protection still matters, especially if complaints or support narratives contain patient-linked text, device identifiers, or clinician details. But the more distinctive risk is output quality. If the route cannot reject responses that are too short, too irrelevant, or too weakly grounded for the workflow, then the organization is still depending on user caution instead of technical enforcement.
There is also an implementation detail that device teams should keep straight. In healthcare-compliance, fda_class primarily controls which built-in disclaimer is used when you do not provide your own disclaimer text. It does not add special automatic blocking rules by itself. The real output gate comes from the block list you configure and from quality-scorer, which actually evaluates the model response after generation.
human-oversight is similarly important to describe accurately. In the current runtime, it is a focused escalation switch. When configured with action: escalate, the gateway returns null assistant content and records an escalation event. It does not perform reviewer assignment or complex queue logic in the policy block. For device workflows, that simplicity is often good news because it creates a clear review stop that downstream systems can consume consistently.
The solution
The most defensible SaMD pattern is to stack three output controls.
Start with healthcare-compliance to block literal phrases that should never appear in the workflow and to inject a clear medical disclaimer when the output reads like treatment or diagnosis advice. This keeps an internal device assistant from sliding into patient-care language it should not produce.
Then use quality-scorer as the actual acceptance gate. For SaMD-adjacent routes, that usually means requiring a minimum level of substance before the response is even eligible for review. Keeptrusts supports minimum-length checks, benchmark toggles, thresholds, assertions, and failure actions. In practice, that lets you reject thin outputs before someone mistakes them for usable analysis.
Finally, place human-oversight after the quality gate when the route should not deliver content directly. The pattern works well for complaint narratives, risk summaries, or regulatory support workflows where a human reviewer must always own the decision to accept or use the output.
The core healthcare references are still the right foundation for this pattern: Healthcare (HIPAA), Healthcare (EU GDPR), HIPAA PHI Detector, Healthcare Compliance, and Secure Healthcare AI. For medical device teams, the important extension is to treat output quality and reviewability as first-class controls, not just downstream QA tasks.
Implementation
This route redacts sensitive identifiers in complaint text, applies healthcare-specific output controls, scores the response, and escalates the result instead of returning it directly.
pack:
name: samd-quality-gate
version: 1.0.0
enabled: true
policies:
chain:
- rbac
- pii-detector
- healthcare-compliance
- quality-scorer
- human-oversight
- audit-logger
policy:
rbac:
deny_if_missing:
- X-Org-ID
- X-User-ID
- X-User-Role
roles:
quality-engineer:
allowed_tools:
- summarize
- report_*
- search
regulatory-reviewer:
allowed_tools:
- "*"
data_access:
quality-engineer:
max_sensitivity: confidential
regulatory-reviewer:
max_sensitivity: restricted
pii-detector:
action: redact
healthcare_mode: true
detect_patterns:
- 'UDI-[A-Z0-9-]{6,32}'
redaction:
marker_format: label
include_metadata: true
custom_markers:
generic_id: "[DEVICE-ID-REDACTED]"
healthcare-compliance:
blocked_patterns:
- prescribe
- diagnose you with
- stop taking
fda_class: III
quality-scorer:
min_output_chars: 120
min_sentences: 3
benchmarks:
ragas_faithfulness: true
ragas_relevancy: true
thresholds:
min_aggregate: 0.8
min_faithfulness: 0.85
min_relevancy: 0.8
failure_action:
action: fallback
fallback_message: Review required because the device-related response did not meet the configured quality threshold.
human-oversight:
action: escalate
audit-logger: {}
This route is intentionally conservative. quality-scorer filters out weak output before it ever becomes candidate material for a reviewer. human-oversight then ensures the route still does not deliver assistant text directly. That combination is useful when the workflow is high consequence enough that "show the user the draft and let them decide" is not an acceptable control.
It is also worth noting what is not happening here. The gateway is not certifying the device workflow as FDA-compliant on its own, and fda_class: III is not a magic safety switch. The real value comes from explicit blocking phrases, measurable quality thresholds, and a deterministic review stop that the team can monitor and export.
The fastest validation loop is:
kt policy lint --file ./samd-quality-gate.yaml
kt gateway run --policy-config ./samd-quality-gate.yaml --port 41002
kt events tail --policy quality-scorer
kt events tail --policy human-oversight
If the route is working as intended, you should see:
- Low-quality outputs fail the configured gate.
- Passing outputs still escalate rather than reaching the user directly.
- The event stream shows a reviewable sequence of policy outcomes.
Results and impact
The biggest benefit is that device teams stop treating quality review as an informal human habit. The route itself now enforces that weak outputs do not pass and that even acceptable outputs stop for review. That is a much stronger pattern for regulated workflows than relying on reviewers to notice subtle model weaknesses after the text is already visible.
This also helps align engineering and regulatory teams. Engineers can point to route configuration and event behavior. Regulatory and quality reviewers can inspect the same evidence instead of relying on assumptions about what the product interface probably did. That shared reference matters in high-consequence workflows where the argument is often about process discipline as much as technical capability.
There is also a lifecycle advantage. As more SaMD-adjacent workflows adopt AI, the organization can reuse the same quality-gated, escalation-first route rather than inventing a new review pattern for complaint intake, CAPA assistance, and labeling support separately.
Key takeaways
- For SaMD-adjacent AI, output quality is a first-class control, not a nice-to-have.
healthcare-compliancehandles literal medical-advice phrase blocking and disclaimers, whilequality-scorerhandles actual response evaluation.fda_classselects default disclaimer behavior; it does not replace a real block list or quality gate.human-oversightis a simple but effective review stop when used withaction: escalate.- Event visibility matters because high-consequence routes need evidence, not just intent.