HIPAA PHI Detector
The hipaa-phi-detector policy detects HIPAA-regulated Protected Health Information (PHI) in AI-generated responses and enforces de-identification controls aligned with the HIPAA Privacy Rule's Safe Harbor method (45 CFR § 164.514(b)). It scans responses for all 18 Safe Harbor identifier categories — from patient names and Social Security numbers to biometric identifiers and full-face photographs — and either redacts the detected PHI in-place or blocks the entire response, depending on your configured action. This policy is a foundational requirement for any AI deployment that processes, generates, or transmits health information in HIPAA-covered entities or their business associates.
Use this page when
- You are deploying AI in a HIPAA-covered entity or business associate and must enforce de-identification controls.
- You need Safe Harbor method compliance (45 CFR § 164.514(b)) covering all 18 identifier categories.
- You want to redact or block PHI in AI-generated responses before they reach end users.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Configuration
pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true
policies:
chain:
- hipaa-phi-detector
policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true
Fields
| Field | Type | Description | Default |
|---|---|---|---|
mode | "hipaa_18" | Detection mode covering all 18 HIPAA Safe Harbor identifier categories as defined in 45 CFR § 164.514(b)(2). This is the standard and currently only supported mode — it detects names, geographic data smaller than a state, dates (except year), phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers and serial numbers, device identifiers and serial numbers, web URLs, IP addresses, biometric identifiers, full-face photographs and comparable images, and any other unique identifying number, characteristic, or code. | "hipaa_18" |
action | "redact" | "block" | Action taken when PHI is detected. "redact" replaces detected PHI with placeholder tokens (e.g., [REDACTED-NAME], [REDACTED-SSN]) while preserving the rest of the response. "block" rejects the entire response and returns a policy-violation error, preventing any part of the PHI-containing response from reaching the end user. Use "redact" when the non-PHI content is valuable and safe to deliver; use "block" when any PHI leakage is unacceptable. | "redact" |
safe_harbor_method | boolean | Enables the HIPAA Safe Harbor de-identification method as defined in 45 CFR § 164.514(b). When true, the detector applies the full 18-category identifier scan — the standard compliance approach accepted by HHS as sufficient for de-identification without requiring a qualified statistical expert. When false, the detector still scans for PHI but does not certify Safe Harbor compliance, which may be appropriate for internal-only workflows where formal de-identification certification is not required. | true |
The 18 HIPAA Safe Harbor Identifiers
The hipaa_18 mode detects and acts on all 18 identifier categories specified in the HIPAA Safe Harbor de-identification standard:
| # | Identifier Category | Examples |
|---|---|---|
| 1 | Names | Full names, first/last names, maiden names |
| 2 | Geographic data (smaller than state) | Street addresses, city names, ZIP codes (first 3 digits retained only if population > 20,000) |
| 3 | Dates (except year) related to an individual | Birth dates, admission dates, discharge dates, date of death, ages over 89 |
| 4 | Phone numbers | All telephone numbers including mobile, home, work |
| 5 | Fax numbers | All facsimile numbers |
| 6 | Email addresses | Personal and work email addresses |
| 7 | Social Security numbers | Full or partial SSNs |
| 8 | Medical record numbers | MRN identifiers from EHR/EMR systems |
| 9 | Health plan beneficiary numbers | Insurance member IDs, plan numbers |
| 10 | Account numbers | Financial account numbers associated with healthcare billing |
| 11 | Certificate/license numbers | Professional license numbers, DEA numbers, driver's license numbers |
| 12 | Vehicle identifiers | VINs, license plate numbers |
| 13 | Device identifiers | UDI (Unique Device Identifiers), serial numbers for medical devices |
| 14 | Web URLs | Personal websites, patient portal URLs |
| 15 | IP addresses | IPv4 and IPv6 addresses |
| 16 | Biometric identifiers | Fingerprints, voiceprints, retinal scans |
| 17 | Full-face photographs | Photographs and comparable images where the individual is identifiable |
| 18 | Any other unique identifying number | Any characteristic or code not covered above that could identify an individual |
Use Cases
Hospital EHR AI Assistant
A hospital deploys an AI assistant for physicians to query patient records and get clinical summaries. The assistant must never expose raw PHI in its responses, even when querying PHI-containing source data.
pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true
policies:
chain:
- hipaa-phi-detector
policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true
Telemedicine Chatbot
A telemedicine platform uses an AI chatbot for patient intake and triage. The chatbot interacts directly with patients who may volunteer PHI in their questions, and the AI responses must not reflect that PHI back or expose other patients' information.
pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true
policies:
chain:
- hipaa-phi-detector
policy:
hipaa-phi-detector:
mode: hipaa_18
action: block
safe_harbor_method: true
Clinical Research Data Sharing
A research institution uses AI to generate summaries of clinical trial data for publication and sharing with collaborators. All outputs must be fully de-identified under Safe Harbor to comply with the Common Rule and HIPAA research provisions.
pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true
policies:
chain:
- hipaa-phi-detector
policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true
Insurance Claims Processing
A health insurance company uses AI to process and summarize claims documentation. Member information must be redacted from AI-generated summaries shared with third-party processors who are not covered under the same Business Associate Agreement (BAA).
pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true
policies:
chain:
- hipaa-phi-detector
policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true
How It Works
The hipaa-phi-detector policy operates as a response-phase filter in the Keeptrusts gateway pipeline:
-
Identifier scanning: After the upstream LLM generates a response, the policy scans the full response text for all 18 HIPAA Safe Harbor identifier categories. Each category uses specialized detection patterns — regex for structured identifiers like SSNs and phone numbers, named-entity recognition heuristics for names and geographic data, and format matching for dates, URLs, and IP addresses.
-
Context-aware detection: The detector considers surrounding context to reduce false positives. For example, a 9-digit number is flagged as a potential SSN only when it appears in a pattern consistent with
XXX-XX-XXXXor in proximity to terms like "social security," "SSN," or "tax ID." Similarly, dates are flagged only when they appear to relate to an individual (near terms like "born," "admitted," "discharged") rather than general calendar references. -
Action execution:
- Redact mode: Each detected PHI instance is replaced with a category-specific placeholder token (e.g.,
[REDACTED-NAME],[REDACTED-SSN],[REDACTED-DOB],[REDACTED-MRN]). The rest of the response is preserved and delivered to the end user. This allows the non-PHI clinical content to remain useful. - Block mode: If any PHI is detected, the entire response is rejected. The gateway returns a policy-violation response indicating PHI was detected. No part of the original response reaches the end user.
- Redact mode: Each detected PHI instance is replaced with a category-specific placeholder token (e.g.,
-
Safe Harbor certification: When
safe_harbor_methodistrue, the policy enforces the full rigor of 45 CFR § 164.514(b)(2). A response that passes through the detector withsafe_harbor_method: trueandaction: redactcan be treated as de-identified under the Safe Harbor standard — no Expert Determination (the alternative de-identification method under 45 CFR § 164.514(b)(1)) is required. -
Audit trail: Every detection event — including the category of PHI detected, the action taken, and whether the response was redacted or blocked — is recorded as a decision event sent to the Keeptrusts API. This audit trail supports HIPAA's accounting-of-disclosures requirement (45 CFR § 164.528) and breach notification investigations.
Combining With Other Policies
The hipaa-phi-detector is most effective as part of a layered healthcare and privacy compliance stack:
healthcare-compliance: Controls the medical content of AI responses (blocking diagnoses, prescriptions, treatment recommendations) whilehipaa-phi-detectorcontrols PHI exposure. Together they cover both clinical safety and data privacy.pii-detector: Catches broader personally identifiable information that may not be PHI under HIPAA but still requires protection, such as financial account numbers in non-healthcare contexts.audit-logger: Ensures all PHI detection and redaction events are logged to a compliance-grade audit trail that meets HIPAA's six-year retention requirement for accounting of disclosures.safety-filter: Adds a general content safety layer on top of PHI-specific controls.financial-compliance: For healthcare organizations that also handle billing and insurance, adds financial compliance controls alongside PHI protection.
pack:
name: hipaa-phi-detector
version: "1.0.0"
enabled: true
policies:
chain:
- hipaa-phi-detector
- healthcare-compliance
- audit-logger
policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true
healthcare-compliance:
blocked_patterns:
- "you (have|are suffering from)"
- "take \\d+ ?mg of"
required_disclaimers:
- "This information is not a substitute for professional medical advice."
fda_class: "II"
audit-logger:
immutable: true
retention_days: 2190
hipaa_audit_controls: true
log_all_access: true
Best Practices
-
Default to
redactoverblockfor clinical workflows: In physician-facing tools, the non-PHI clinical content in a response is often valuable. Redacting the PHI and preserving the clinical reasoning gives practitioners useful information while maintaining compliance. Reserveblockfor patient-facing or external-sharing scenarios where any PHI leakage is unacceptable. -
Always enable
safe_harbor_methodfor external data sharing: If AI-generated outputs will be shared outside your organization, with researchers, or with non-BAA third parties, Safe Harbor compliance is the clearest legal protection. Disable it only for internal-only workflows where formal de-identification certification is unnecessary. -
Pair with
healthcare-compliancefor complete coverage: PHI detection and medical content safety are separate concerns. Thehipaa-phi-detectorprevents data exposure; thehealthcare-compliancepolicy prevents unsafe medical advice. A compliant healthcare AI deployment requires both. -
Audit and review redaction logs: Regularly review the audit trail to verify that the detector is catching PHI correctly and not producing excessive false positives. False positives (redacting non-PHI content) reduce the utility of AI responses; false negatives (missing real PHI) create compliance risk. Both require calibration.
-
Consider
blockmode for high-risk scenarios: For AI systems that interact directly with patients, generate externally shared reports, or operate in breach-sensitive environments,blockmode provides the strongest protection. A blocked response can be reviewed by a human before any content is released. -
Remember the 18th category: The 18th Safe Harbor identifier — "any other unique identifying number, characteristic, or code" — is a catch-all. Custom internal identifiers, study enrollment numbers, or proprietary patient codes all fall under this category. If your organization uses custom identifiers, verify that the detector recognizes them or add supplementary patterns.
For AI systems
- Canonical terms: Keeptrusts, hipaa-phi-detector, mode, hipaa_18, action, safe_harbor_method, PHI, HIPAA, Safe Harbor, 18 identifiers, redact, block
- Config/command names:
hipaa-phi-detectorpolicy,mode: hipaa_18,action(redact/block),safe_harbor_method - Best next pages: Healthcare Compliance, PII Detector, Human Oversight
For engineers
- Prerequisites: Understanding of HIPAA Safe Harbor de-identification requirements. Determine whether your use case needs
redact(preserve non-PHI content) orblock(reject entire response). - Validation: Test with responses containing each of the 18 identifier categories (names, dates, SSNs, MRNs, etc.) and verify redaction markers or blocking. Run
kt policy testwith PHI-containing test cases. - Key commands:
kt policy lint,kt policy test,kt events tail
For leaders
- Governance: HIPAA PHI detection is a regulatory requirement for covered entities and business associates. Safe Harbor compliance provides a defensible de-identification standard accepted by HHS.
- Cost: Local detection with no external calls. Non-compliance costs include OCR enforcement actions (up to $1.9M per violation category), breach notifications, and reputational damage.
- Rollout: Deploy with
action: redactandsafe_harbor_method: trueas the baseline. Useaction: blockfor high-risk environments where any PHI leakage is unacceptable. Pair withhealthcare-compliancefor full medical AI governance.
Next steps
- Healthcare Compliance — Medical content controls and FDA classification
- PII Detector — General PII detection with healthcare mode
- Human Oversight — Escalate clinical content for review
- RBAC — HIPAA minimum-necessary access controls