HIPAA PHI Detector

The hipaa-phi-detector policy detects HIPAA-regulated Protected Health Information (PHI) in AI-generated responses and enforces de-identification controls aligned with the HIPAA Privacy Rule's Safe Harbor method (45 CFR § 164.514(b)). It scans responses for all 18 Safe Harbor identifier categories — from patient names and Social Security numbers to biometric identifiers and full-face photographs — and either redacts the detected PHI in-place or blocks the entire response, depending on your configured action. This policy is a foundational requirement for any AI deployment that processes, generates, or transmits health information in HIPAA-covered entities or their business associates.

Use this page when

You are deploying AI in a HIPAA-covered entity or business associate and must enforce de-identification controls.
You need Safe Harbor method compliance (45 CFR § 164.514(b)) covering all 18 identifier categories.
You want to redact or block PHI in AI-generated responses before they reach end users.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Configuration

pack:
  name: hipaa-phi-detector
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - hipaa-phi-detector

policy:
  hipaa-phi-detector:
    mode: hipaa_18
    action: redact
    safe_harbor_method: true

Fields

Field	Type	Description	Default
`mode`	`"hipaa_18"`	Detection mode covering all 18 HIPAA Safe Harbor identifier categories as defined in 45 CFR § 164.514(b)(2). This is the standard and currently only supported mode — it detects names, geographic data smaller than a state, dates (except year), phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers and serial numbers, device identifiers and serial numbers, web URLs, IP addresses, biometric identifiers, full-face photographs and comparable images, and any other unique identifying number, characteristic, or code.	`"hipaa_18"`
`action`	`"redact"` \| `"block"`	Action taken when PHI is detected. `"redact"` replaces detected PHI with placeholder tokens (e.g., `[REDACTED-NAME]`, `[REDACTED-SSN]`) while preserving the rest of the response. `"block"` rejects the entire response and returns a policy-violation error, preventing any part of the PHI-containing response from reaching the end user. Use `"redact"` when the non-PHI content is valuable and safe to deliver; use `"block"` when any PHI leakage is unacceptable.	`"redact"`
`safe_harbor_method`	`boolean`	Enables the HIPAA Safe Harbor de-identification method as defined in 45 CFR § 164.514(b). When `true`, the detector applies the full 18-category identifier scan — the standard compliance approach accepted by HHS as sufficient for de-identification without requiring a qualified statistical expert. When `false`, the detector still scans for PHI but does not certify Safe Harbor compliance, which may be appropriate for internal-only workflows where formal de-identification certification is not required.	`true`

The 18 HIPAA Safe Harbor Identifiers

The hipaa_18 mode detects and acts on all 18 identifier categories specified in the HIPAA Safe Harbor de-identification standard:

#	Identifier Category	Examples
1	Names	Full names, first/last names, maiden names
2	Geographic data (smaller than state)	Street addresses, city names, ZIP codes (first 3 digits retained only if population > 20,000)
3	Dates (except year) related to an individual	Birth dates, admission dates, discharge dates, date of death, ages over 89
4	Phone numbers	All telephone numbers including mobile, home, work
5	Fax numbers	All facsimile numbers
6	Email addresses	Personal and work email addresses
7	Social Security numbers	Full or partial SSNs
8	Medical record numbers	MRN identifiers from EHR/EMR systems
9	Health plan beneficiary numbers	Insurance member IDs, plan numbers
10	Account numbers	Financial account numbers associated with healthcare billing
11	Certificate/license numbers	Professional license numbers, DEA numbers, driver's license numbers
12	Vehicle identifiers	VINs, license plate numbers
13	Device identifiers	UDI (Unique Device Identifiers), serial numbers for medical devices
14	Web URLs	Personal websites, patient portal URLs
15	IP addresses	IPv4 and IPv6 addresses
16	Biometric identifiers	Fingerprints, voiceprints, retinal scans
17	Full-face photographs	Photographs and comparable images where the individual is identifiable
18	Any other unique identifying number	Any characteristic or code not covered above that could identify an individual

Use Cases

Hospital EHR AI Assistant

A hospital deploys an AI assistant for physicians to query patient records and get clinical summaries. The assistant must never expose raw PHI in its responses, even when querying PHI-containing source data.

pack:
  name: hipaa-phi-detector
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - hipaa-phi-detector

policy:
  hipaa-phi-detector:
    mode: hipaa_18
    action: redact
    safe_harbor_method: true

Telemedicine Chatbot

A telemedicine platform uses an AI chatbot for patient intake and triage. The chatbot interacts directly with patients who may volunteer PHI in their questions, and the AI responses must not reflect that PHI back or expose other patients' information.

pack:
  name: hipaa-phi-detector
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - hipaa-phi-detector

policy:
  hipaa-phi-detector:
    mode: hipaa_18
    action: block
    safe_harbor_method: true

A research institution uses AI to generate summaries of clinical trial data for publication and sharing with collaborators. All outputs must be fully de-identified under Safe Harbor to comply with the Common Rule and HIPAA research provisions.

pack:
  name: hipaa-phi-detector
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - hipaa-phi-detector

policy:
  hipaa-phi-detector:
    mode: hipaa_18
    action: redact
    safe_harbor_method: true

Insurance Claims Processing

A health insurance company uses AI to process and summarize claims documentation. Member information must be redacted from AI-generated summaries shared with third-party processors who are not covered under the same Business Associate Agreement (BAA).

pack:
  name: hipaa-phi-detector
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - hipaa-phi-detector

policy:
  hipaa-phi-detector:
    mode: hipaa_18
    action: redact
    safe_harbor_method: true

How It Works

The hipaa-phi-detector policy operates as a response-phase filter in the Keeptrusts gateway pipeline:

Identifier scanning: After the upstream LLM generates a response, the policy scans the full response text for all 18 HIPAA Safe Harbor identifier categories. Each category uses specialized detection patterns — regex for structured identifiers like SSNs and phone numbers, named-entity recognition heuristics for names and geographic data, and format matching for dates, URLs, and IP addresses.
Context-aware detection: The detector considers surrounding context to reduce false positives. For example, a 9-digit number is flagged as a potential SSN only when it appears in a pattern consistent with XXX-XX-XXXX or in proximity to terms like "social security," "SSN," or "tax ID." Similarly, dates are flagged only when they appear to relate to an individual (near terms like "born," "admitted," "discharged") rather than general calendar references.
Action execution:
- Redact mode: Each detected PHI instance is replaced with a category-specific placeholder token (e.g., [REDACTED-NAME], [REDACTED-SSN], [REDACTED-DOB], [REDACTED-MRN]). The rest of the response is preserved and delivered to the end user. This allows the non-PHI clinical content to remain useful.
- Block mode: If any PHI is detected, the entire response is rejected. The gateway returns a policy-violation response indicating PHI was detected. No part of the original response reaches the end user.
Safe Harbor certification: When safe_harbor_method is true, the policy enforces the full rigor of 45 CFR § 164.514(b)(2). A response that passes through the detector with safe_harbor_method: true and action: redact can be treated as de-identified under the Safe Harbor standard — no Expert Determination (the alternative de-identification method under 45 CFR § 164.514(b)(1)) is required.
Audit trail: Every detection event — including the category of PHI detected, the action taken, and whether the response was redacted or blocked — is recorded as a decision event sent to the Keeptrusts API. This audit trail supports HIPAA's accounting-of-disclosures requirement (45 CFR § 164.528) and breach notification investigations.

Combining With Other Policies

The hipaa-phi-detector is most effective as part of a layered healthcare and privacy compliance stack:

healthcare-compliance: Controls the medical content of AI responses (blocking diagnoses, prescriptions, treatment recommendations) while hipaa-phi-detector controls PHI exposure. Together they cover both clinical safety and data privacy.
pii-detector: Catches broader personally identifiable information that may not be PHI under HIPAA but still requires protection, such as financial account numbers in non-healthcare contexts.
audit-logger: Ensures all PHI detection and redaction events are logged to a compliance-grade audit trail that meets HIPAA's six-year retention requirement for accounting of disclosures.
safety-filter: Adds a general content safety layer on top of PHI-specific controls.
financial-compliance: For healthcare organizations that also handle billing and insurance, adds financial compliance controls alongside PHI protection.

pack:
  name: hipaa-phi-detector
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - hipaa-phi-detector
    - healthcare-compliance
    - audit-logger

policy:
  hipaa-phi-detector:
    mode: hipaa_18
    action: redact
    safe_harbor_method: true

  healthcare-compliance:
    blocked_patterns:
      - "you (have|are suffering from)"
      - "take \\d+ ?mg of"
    required_disclaimers:
      - "This information is not a substitute for professional medical advice."
    fda_class: "II"

  audit-logger:
    immutable: true
    retention_days: 2190
    hipaa_audit_controls: true
    log_all_access: true

Best Practices

Default to redact over block for clinical workflows: In physician-facing tools, the non-PHI clinical content in a response is often valuable. Redacting the PHI and preserving the clinical reasoning gives practitioners useful information while maintaining compliance. Reserve block for patient-facing or external-sharing scenarios where any PHI leakage is unacceptable.
Always enable safe_harbor_method for external data sharing: If AI-generated outputs will be shared outside your organization, with researchers, or with non-BAA third parties, Safe Harbor compliance is the clearest legal protection. Disable it only for internal-only workflows where formal de-identification certification is unnecessary.
Pair with healthcare-compliance for complete coverage: PHI detection and medical content safety are separate concerns. The hipaa-phi-detector prevents data exposure; the healthcare-compliance policy prevents unsafe medical advice. A compliant healthcare AI deployment requires both.
Audit and review redaction logs: Regularly review the audit trail to verify that the detector is catching PHI correctly and not producing excessive false positives. False positives (redacting non-PHI content) reduce the utility of AI responses; false negatives (missing real PHI) create compliance risk. Both require calibration.
Consider block mode for high-risk scenarios: For AI systems that interact directly with patients, generate externally shared reports, or operate in breach-sensitive environments, block mode provides the strongest protection. A blocked response can be reviewed by a human before any content is released.
Remember the 18th category: The 18th Safe Harbor identifier — "any other unique identifying number, characteristic, or code" — is a catch-all. Custom internal identifiers, study enrollment numbers, or proprietary patient codes all fall under this category. If your organization uses custom identifiers, verify that the detector recognizes them or add supplementary patterns.

For AI systems

Canonical terms: Keeptrusts, hipaa-phi-detector, mode, hipaa_18, action, safe_harbor_method, PHI, HIPAA, Safe Harbor, 18 identifiers, redact, block
Config/command names: hipaa-phi-detector policy, mode: hipaa_18, action (redact/block), safe_harbor_method
Best next pages: Healthcare Compliance, PII Detector, Human Oversight

For engineers

Prerequisites: Understanding of HIPAA Safe Harbor de-identification requirements. Determine whether your use case needs redact (preserve non-PHI content) or block (reject entire response).
Validation: Test with responses containing each of the 18 identifier categories (names, dates, SSNs, MRNs, etc.) and verify redaction markers or blocking. Run kt policy test with PHI-containing test cases.
Key commands: kt policy lint, kt policy test, kt events tail

For leaders

Governance: HIPAA PHI detection is a regulatory requirement for covered entities and business associates. Safe Harbor compliance provides a defensible de-identification standard accepted by HHS.
Cost: Local detection with no external calls. Non-compliance costs include OCR enforcement actions (up to $1.9M per violation category), breach notifications, and reputational damage.
Rollout: Deploy with action: redact and safe_harbor_method: true as the baseline. Use action: block for high-risk environments where any PHI leakage is unacceptable. Pair with healthcare-compliance for full medical AI governance.

Next steps

Healthcare Compliance — Medical content controls and FDA classification
PII Detector — General PII detection with healthcare mode
Human Oversight — Escalate clinical content for review
RBAC — HIPAA minimum-necessary access controls

Use this page when​

Primary audience​

Configuration​

Fields​

The 18 HIPAA Safe Harbor Identifiers​

Use Cases​

Hospital EHR AI Assistant​

Telemedicine Chatbot​

Clinical Research Data Sharing​

Insurance Claims Processing​

How It Works​

Combining With Other Policies​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​