Skip to main content

Medical Records Summarization: Safe AI Processing with PHI Redaction

Medical-record summarization is one of the most practical healthcare AI use cases and one of the easiest places to create a compliance incident. Chart notes, referral packets, discharge summaries, and longitudinal records contain exactly the data that general-purpose LLM integrations should not handle casually. Keeptrusts makes the workflow workable by redacting PHI-like content before the upstream call, constraining provider selection, and keeping a reviewable event trail so summarization becomes a governed process rather than a copy-and-paste habit.

Use this page when

  • You are deploying AI to summarize clinical notes, discharge packets, or referral histories.
  • You want patient data minimized before any external model call.
  • You need a route that can be standardized across care teams and document workflows.

Primary audience

  • Primary: Technical Engineers
  • Secondary: Technical Leaders, Clinical operations teams

The problem

Summarization looks harmless because the task is clerical. Teams are not asking the model to diagnose the patient. They are asking it to compress information. That makes adoption fast and governance sloppy.

The issue is that medical-record summarization puts the richest patient context directly into the prompt. Records contain names, medical record numbers, dates of service, contact details, insurance references, provider names, facility names, and narrative details about diagnoses and treatment. If a workflow sends that content directly to a provider without a strict control boundary, the organization has already accepted a risk that most compliance teams would never approve if it were made explicit.

There is also a hidden output-side risk. Summaries can drift into interpretation. A model asked to summarize a discharge note may infer what treatment should happen next, restate dosing changes as recommendations, or phrase a problem list like direct clinical guidance. That is not always acceptable, even if the input was de-identified.

The healthcare docs in this repo already describe the relevant control surface. Healthcare (HIPAA) defines the PHI and audit expectations. Healthcare (EU GDPR) adds data minimization and jurisdictional concerns. HIPAA PHI Detector documents the text-focused PHI control, including buffered response redaction when the policy is in the chain. Healthcare Compliance governs medical-looking output. Secure Healthcare AI combines them into an operational pattern.

What record-summarization teams need is a narrower version of that pattern tuned for document-heavy routes.

The solution

The safest summarization workflow follows a simple rule: raw records should enter a governed route, not a direct model SDK call.

Use hipaa-phi-detector on the request path so PHI-like text is detected before the upstream call. For most summarization workflows, action: redact is more practical than action: block, because the goal is usually de-identified summarization rather than total rejection. Add pii-detector in healthcare mode to broaden the minimization pass. Then place data-routing-policy ahead of the provider boundary so only approved zero-retention targets can receive the de-identified request.

On the output side, use healthcare-compliance so the summary does not quietly turn into treatment guidance without a disclaimer or block. Keep audit-logger in the chain because document workflows are often the first thing a compliance reviewer asks to inspect. Summarization is operationally common and therefore a likely audit target.

This design gives you a meaningful compromise between utility and control. Teams still get AI-assisted summaries. But the gateway, not the application, decides what patient-linked text can proceed and where it can go.

Implementation

This example is a practical route for summarizing chart excerpts and referral packets while minimizing PHI before the upstream call.

pack:
name: medical-record-summarization
version: 1.0.0
enabled: true

providers:
targets:
- id: openai-zdr
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
data_policy:
zero_data_retention: true
training_opt_out: true
retention_days: 0

policies:
chain:
- data-routing-policy
- hipaa-phi-detector
- pii-detector
- healthcare-compliance
- audit-logger

policy:
data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
on_no_compliant_provider: block

hipaa-phi-detector:
action: redact
mode: hipaa_18
safe_harbor_method: true

pii-detector:
action: redact
healthcare_mode: true

healthcare-compliance:
blocked_patterns:
- prescribe
- stop taking
required_disclaimers:
- This summary is informational only and must be reviewed by a licensed clinician.
fda_class: II

audit-logger:
immutable: true
retention_days: 2555
hipaa_audit_controls: true

One reason this route is effective is that it matches how real summarization pipelines work. Teams usually ingest text, segment it, send it to a model, and assemble the results. The governed route fits naturally into that pipeline because it controls the upstream call boundary rather than forcing teams to redesign the whole summarization job.

It also aligns with documented runtime behavior. hipaa-phi-detector is text-focused and participates in response redaction when present in the chain. That matters for summarization because a model can echo or restate identifiers even after the request path was minimized. Keeping the detector active helps reduce that output risk.

The operational validation is short and repeatable:

kt policy lint --file ./medical-record-summarization.yaml
kt gateway run --policy-config ./medical-record-summarization.yaml --port 41002
kt events tail --policy hipaa-phi-detector

Use synthetic sample records during rollout and inspect the event outcomes before expanding to broader clinical teams.

Results and impact

The immediate gain is safer adoption. Clinical operations teams often want summarization first because it saves obvious time. With a governed route, they can use that productivity benefit without normalizing raw-record uploads to unmanaged model endpoints.

There is also a strong governance benefit. Once record summarization uses a standard route, future document workflows can inherit the same boundary. Referral triage, discharge-note compression, handoff preparation, and case-history summarization do not each need their own PHI redaction implementation. They need the same governed pipeline.

This also makes audit review easier. Instead of asking whether each workflow used approved prompt handling, reviewers can sample the same event system and the same route configuration. That shortens the path from compliance question to technical evidence.

Most importantly, it changes team behavior. People stop thinking of summarization as “just a quick AI helper” and start treating it like a governed healthcare workflow. That shift matters because routine tasks are where AI governance usually fails first.

Key takeaways

  • Medical-record summarization should use a governed route, not direct model calls from document workflows.
  • hipaa-phi-detector and pii-detector reduce input risk, while healthcare-compliance governs summaries that drift toward advice.
  • data-routing-policy ensures de-identified traffic still uses only approved provider targets.
  • Keep audit-logger in the route so summarization remains reviewable.
  • Use the established healthcare docs as the implementation baseline: Healthcare (HIPAA), Healthcare (EU GDPR), HIPAA PHI Detector, Healthcare Compliance, and Secure Healthcare AI.

Next steps