Skip to main content
Browse docs

HIPAA PHI Detector

The hipaa-phi-detector policy detects HIPAA-regulated Protected Health Information (PHI) in AI-generated responses and enforces de-identification controls aligned with the HIPAA Privacy Rule's Safe Harbor method (45 CFR § 164.514(b)). It scans responses for all 18 Safe Harbor identifier categories — from patient names and Social Security numbers to biometric identifiers and full-face photographs — and either redacts the detected PHI in-place or blocks the entire response, depending on your configured action. This policy is a foundational requirement for any AI deployment that processes, generates, or transmits health information in HIPAA-covered entities or their business associates.

Use this page when

  • You are deploying AI in a HIPAA-covered entity or business associate and must enforce de-identification controls.
  • You need Safe Harbor method compliance (45 CFR § 164.514(b)) covering all 18 identifier categories.
  • You want to redact or block PHI in AI-generated responses before they reach end users.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Configuration

pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true

policies:
chain:
- hipaa-phi-detector

policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true

Fields

FieldTypeDescriptionDefault
mode"hipaa_18"Detection mode covering all 18 HIPAA Safe Harbor identifier categories as defined in 45 CFR § 164.514(b)(2). This is the standard and currently only supported mode — it detects names, geographic data smaller than a state, dates (except year), phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers and serial numbers, device identifiers and serial numbers, web URLs, IP addresses, biometric identifiers, full-face photographs and comparable images, and any other unique identifying number, characteristic, or code."hipaa_18"
action"redact" | "block"Action taken when PHI is detected. "redact" replaces detected PHI with placeholder tokens (e.g., [REDACTED-NAME], [REDACTED-SSN]) while preserving the rest of the response. "block" rejects the entire response and returns a policy-violation error, preventing any part of the PHI-containing response from reaching the end user. Use "redact" when the non-PHI content is valuable and safe to deliver; use "block" when any PHI leakage is unacceptable."redact"
safe_harbor_methodbooleanEnables the HIPAA Safe Harbor de-identification method as defined in 45 CFR § 164.514(b). When true, the detector applies the full 18-category identifier scan — the standard compliance approach accepted by HHS as sufficient for de-identification without requiring a qualified statistical expert. When false, the detector still scans for PHI but does not certify Safe Harbor compliance, which may be appropriate for internal-only workflows where formal de-identification certification is not required.true

The 18 HIPAA Safe Harbor Identifiers

The hipaa_18 mode detects and acts on all 18 identifier categories specified in the HIPAA Safe Harbor de-identification standard:

#Identifier CategoryExamples
1NamesFull names, first/last names, maiden names
2Geographic data (smaller than state)Street addresses, city names, ZIP codes (first 3 digits retained only if population > 20,000)
3Dates (except year) related to an individualBirth dates, admission dates, discharge dates, date of death, ages over 89
4Phone numbersAll telephone numbers including mobile, home, work
5Fax numbersAll facsimile numbers
6Email addressesPersonal and work email addresses
7Social Security numbersFull or partial SSNs
8Medical record numbersMRN identifiers from EHR/EMR systems
9Health plan beneficiary numbersInsurance member IDs, plan numbers
10Account numbersFinancial account numbers associated with healthcare billing
11Certificate/license numbersProfessional license numbers, DEA numbers, driver's license numbers
12Vehicle identifiersVINs, license plate numbers
13Device identifiersUDI (Unique Device Identifiers), serial numbers for medical devices
14Web URLsPersonal websites, patient portal URLs
15IP addressesIPv4 and IPv6 addresses
16Biometric identifiersFingerprints, voiceprints, retinal scans
17Full-face photographsPhotographs and comparable images where the individual is identifiable
18Any other unique identifying numberAny characteristic or code not covered above that could identify an individual

Use Cases

Hospital EHR AI Assistant

A hospital deploys an AI assistant for physicians to query patient records and get clinical summaries. The assistant must never expose raw PHI in its responses, even when querying PHI-containing source data.

pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true

policies:
chain:
- hipaa-phi-detector

policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true

Telemedicine Chatbot

A telemedicine platform uses an AI chatbot for patient intake and triage. The chatbot interacts directly with patients who may volunteer PHI in their questions, and the AI responses must not reflect that PHI back or expose other patients' information.

pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true

policies:
chain:
- hipaa-phi-detector

policy:
hipaa-phi-detector:
mode: hipaa_18
action: block
safe_harbor_method: true

Clinical Research Data Sharing

A research institution uses AI to generate summaries of clinical trial data for publication and sharing with collaborators. All outputs must be fully de-identified under Safe Harbor to comply with the Common Rule and HIPAA research provisions.

pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true

policies:
chain:
- hipaa-phi-detector

policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true

Insurance Claims Processing

A health insurance company uses AI to process and summarize claims documentation. Member information must be redacted from AI-generated summaries shared with third-party processors who are not covered under the same Business Associate Agreement (BAA).

pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true

policies:
chain:
- hipaa-phi-detector

policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true

How It Works

The hipaa-phi-detector policy operates as a response-phase filter in the Keeptrusts gateway pipeline:

  1. Identifier scanning: After the upstream LLM generates a response, the policy scans the full response text for all 18 HIPAA Safe Harbor identifier categories. Each category uses specialized detection patterns — regex for structured identifiers like SSNs and phone numbers, named-entity recognition heuristics for names and geographic data, and format matching for dates, URLs, and IP addresses.

  2. Context-aware detection: The detector considers surrounding context to reduce false positives. For example, a 9-digit number is flagged as a potential SSN only when it appears in a pattern consistent with XXX-XX-XXXX or in proximity to terms like "social security," "SSN," or "tax ID." Similarly, dates are flagged only when they appear to relate to an individual (near terms like "born," "admitted," "discharged") rather than general calendar references.

  3. Action execution:

    • Redact mode: Each detected PHI instance is replaced with a category-specific placeholder token (e.g., [REDACTED-NAME], [REDACTED-SSN], [REDACTED-DOB], [REDACTED-MRN]). The rest of the response is preserved and delivered to the end user. This allows the non-PHI clinical content to remain useful.
    • Block mode: If any PHI is detected, the entire response is rejected. The gateway returns a policy-violation response indicating PHI was detected. No part of the original response reaches the end user.
  4. Safe Harbor certification: When safe_harbor_method is true, the policy enforces the full rigor of 45 CFR § 164.514(b)(2). A response that passes through the detector with safe_harbor_method: true and action: redact can be treated as de-identified under the Safe Harbor standard — no Expert Determination (the alternative de-identification method under 45 CFR § 164.514(b)(1)) is required.

  5. Audit trail: Every detection event — including the category of PHI detected, the action taken, and whether the response was redacted or blocked — is recorded as a decision event sent to the Keeptrusts API. This audit trail supports HIPAA's accounting-of-disclosures requirement (45 CFR § 164.528) and breach notification investigations.

Combining With Other Policies

The hipaa-phi-detector is most effective as part of a layered healthcare and privacy compliance stack:

  • healthcare-compliance: Controls the medical content of AI responses (blocking diagnoses, prescriptions, treatment recommendations) while hipaa-phi-detector controls PHI exposure. Together they cover both clinical safety and data privacy.
  • pii-detector: Catches broader personally identifiable information that may not be PHI under HIPAA but still requires protection, such as financial account numbers in non-healthcare contexts.
  • audit-logger: Ensures all PHI detection and redaction events are logged to a compliance-grade audit trail that meets HIPAA's six-year retention requirement for accounting of disclosures.
  • safety-filter: Adds a general content safety layer on top of PHI-specific controls.
  • financial-compliance: For healthcare organizations that also handle billing and insurance, adds financial compliance controls alongside PHI protection.
pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true

policies:
chain:
- hipaa-phi-detector
- healthcare-compliance
- audit-logger

policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true

healthcare-compliance:
blocked_patterns:
- "you (have|are suffering from)"
- "take \\d+ ?mg of"
required_disclaimers:
- "This information is not a substitute for professional medical advice."
fda_class: "II"

audit-logger:
immutable: true
retention_days: 2190
hipaa_audit_controls: true
log_all_access: true

Best Practices

  1. Default to redact over block for clinical workflows: In physician-facing tools, the non-PHI clinical content in a response is often valuable. Redacting the PHI and preserving the clinical reasoning gives practitioners useful information while maintaining compliance. Reserve block for patient-facing or external-sharing scenarios where any PHI leakage is unacceptable.

  2. Always enable safe_harbor_method for external data sharing: If AI-generated outputs will be shared outside your organization, with researchers, or with non-BAA third parties, Safe Harbor compliance is the clearest legal protection. Disable it only for internal-only workflows where formal de-identification certification is unnecessary.

  3. Pair with healthcare-compliance for complete coverage: PHI detection and medical content safety are separate concerns. The hipaa-phi-detector prevents data exposure; the healthcare-compliance policy prevents unsafe medical advice. A compliant healthcare AI deployment requires both.

  4. Audit and review redaction logs: Regularly review the audit trail to verify that the detector is catching PHI correctly and not producing excessive false positives. False positives (redacting non-PHI content) reduce the utility of AI responses; false negatives (missing real PHI) create compliance risk. Both require calibration.

  5. Consider block mode for high-risk scenarios: For AI systems that interact directly with patients, generate externally shared reports, or operate in breach-sensitive environments, block mode provides the strongest protection. A blocked response can be reviewed by a human before any content is released.

  6. Remember the 18th category: The 18th Safe Harbor identifier — "any other unique identifying number, characteristic, or code" — is a catch-all. Custom internal identifiers, study enrollment numbers, or proprietary patient codes all fall under this category. If your organization uses custom identifiers, verify that the detector recognizes them or add supplementary patterns.

For AI systems

  • Canonical terms: Keeptrusts, hipaa-phi-detector, mode, hipaa_18, action, safe_harbor_method, PHI, HIPAA, Safe Harbor, 18 identifiers, redact, block
  • Config/command names: hipaa-phi-detector policy, mode: hipaa_18, action (redact/block), safe_harbor_method
  • Best next pages: Healthcare Compliance, PII Detector, Human Oversight

For engineers

  • Prerequisites: Understanding of HIPAA Safe Harbor de-identification requirements. Determine whether your use case needs redact (preserve non-PHI content) or block (reject entire response).
  • Validation: Test with responses containing each of the 18 identifier categories (names, dates, SSNs, MRNs, etc.) and verify redaction markers or blocking. Run kt policy test with PHI-containing test cases.
  • Key commands: kt policy lint, kt policy test, kt events tail

For leaders

  • Governance: HIPAA PHI detection is a regulatory requirement for covered entities and business associates. Safe Harbor compliance provides a defensible de-identification standard accepted by HHS.
  • Cost: Local detection with no external calls. Non-compliance costs include OCR enforcement actions (up to $1.9M per violation category), breach notifications, and reputational damage.
  • Rollout: Deploy with action: redact and safe_harbor_method: true as the baseline. Use action: block for high-risk environments where any PHI leakage is unacceptable. Pair with healthcare-compliance for full medical AI governance.

Next steps