PII Detector
The pii-detector policy detects and optionally redacts personally identifiable information (PII) in requests before they reach the AI provider. It supports standard PII categories out of the box (SSN, phone numbers, email addresses, credit cards) and can be extended with healthcare-specific identifiers, PCI-DSS patterns, and custom regex rules.
Use this page when
- You need to detect and redact personally identifiable information before requests reach AI providers.
- You are configuring PII detection with healthcare mode, PCI-DSS mode, or custom regex patterns.
- You want to control redaction format (labels, asterisks, partial masking) and per-type custom markers.
Keeptrusts applies PII detection inline — the original request is never forwarded with unredacted data when redaction is enabled.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Configuration
pack:
name: "pii-protection"
version: "0.1.0"
enabled: true
policies:
chain:
- pii-detector
policy:
pii-detector:
action: "redact"
healthcare_mode: false
pci_mode: true
detect_patterns: []
redaction:
marker_format: "label"
include_metadata: true
preserve_length: false
custom_markers: {}
Fields
| Field | Type | Default | Description |
|---|---|---|---|
action | enum | "redact" | Action on PII detection. "redact" replaces detected PII with markers and forwards the sanitized request. "block" rejects the entire request. |
healthcare_mode | boolean | false | Enable healthcare-specific PII identifiers including Medical Record Numbers (MRN), insurance IDs, and National Provider Identifiers (NPI). For full HIPAA compliance, combine with the hipaa-phi-detector policy. |
pci_mode | boolean | true | Enable PCI-DSS credit card, CVV, and cardholder name detection. When enabled, 16-digit card numbers (Visa, Mastercard, Amex, Discover), 3–4 digit CVVs, and cardholder name patterns adjacent to card data are detected. |
detect_patterns | string[] | [] | Custom regex patterns for additional PII categories. Each pattern is evaluated case-insensitively against the input. Use named capture groups for descriptive audit metadata (e.g., (?P<employee_id>EMP-\d{6})). |
redaction | object | (see below) | Controls the format and metadata of redacted output. |
redaction sub-object
Controls how detected PII is replaced in the forwarded request.
| Field | Type | Default | Description |
|---|---|---|---|
redaction.marker_format | enum | "label" | Replacement format for detected PII. "label" replaces with the PII type in brackets (e.g., [SSN], [EMAIL]). "asterisk" replaces with asterisks (***). "partial" masks the middle of the value while preserving the first and last characters (e.g., J***n D*e). |
redaction.include_metadata | boolean | true | Include PII type and character offset metadata in the decision event sent to the Keeptrusts API. Enables audit trail reconstruction of what was redacted and where. |
redaction.preserve_length | boolean | false | Ensure the replacement string matches the character length of the original PII value. Useful when downstream systems validate fixed-width fields. Only applies to "asterisk" and "partial" marker formats. |
redaction.custom_markers | object | {} | Per-PII-type custom replacement strings. Keys are PII type labels (e.g., SSN, EMAIL, PHONE), values are the literal replacement text. Overrides marker_format for the specified types. Example: { "SSN": "[REDACTED-SSN]", "EMAIL": "[EMAIL REMOVED]" }. |
Built-in Detection Categories
The following PII categories are always active regardless of configuration:
| Category | Examples | PII Label |
|---|---|---|
| Email addresses | john@example.com | EMAIL |
| Phone numbers | +1-555-123-4567, (555) 123-4567 | PHONE |
| Social Security numbers | 123-45-6789 | SSN |
| IP addresses | 192.168.1.1, 2001:db8::1 | IP_ADDRESS |
Healthcare Mode Categories
Enabled when healthcare_mode: true:
| Category | Examples | PII Label |
|---|---|---|
| Medical Record Numbers | MRN-12345678, MRN 87654321 | MRN |
| Insurance IDs | INS-ABC-123456 | INSURANCE_ID |
| National Provider Identifiers | NPI: 1234567890 (10-digit) | NPI |
PCI Mode Categories
Enabled when pci_mode: true:
| Category | Examples | PII Label |
|---|---|---|
| Credit card numbers | 4111-1111-1111-1111, 5500 0000 0000 0004 | CREDIT_CARD |
| CVV/CVC codes | 123, 4567 (3–4 digits near card context) | CVV |
| Cardholder names | Name patterns adjacent to card number context | CARDHOLDER |
Use Cases
1. Default PII redaction (SSN, phone, email, credit card)
Standard PII protection with label-based redaction markers:
policy:
pii-detector:
action: redact
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers: {}
pack:
name: pii-detector-example-2
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
Input: My SSN is 123-45-6789 and email is john@example.com
Output: My SSN is [SSN] and email is [EMAIL]
2. Healthcare mode with MRN/NPI detection
Enable healthcare identifiers for medical applications:
policy:
pii-detector:
action: redact
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers:
MRN: "[MEDICAL-RECORD-REDACTED]"
NPI: "[PROVIDER-ID-REDACTED]"
pack:
name: pii-detector-example-3
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
Input: Patient MRN-12345678 was seen by NPI: 1234567890
Output: Patient [MEDICAL-RECORD-REDACTED] was seen by [PROVIDER-ID-REDACTED]
3. PCI-DSS strict mode for payment processing
Block requests entirely if credit card data is detected — no redaction, no forwarding:
policy:
pii-detector:
action: block
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers: {}
pack:
name: pii-detector-example-4
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
Any request containing a credit card number, CVV, or cardholder name pattern is rejected with an error before reaching the AI provider.
4. Asterisk redaction format for user-facing applications
Use asterisk masking when redacted output is shown to end users:
policy:
pii-detector:
action: redact
redaction:
marker_format: asterisk
include_metadata: true
preserve_length: true
custom_markers: {}
pack:
name: pii-detector-example-5
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
Input: My SSN is 123-45-6789
Output: My SSN is *********** (length preserved)
5. Custom markers for regulatory audit trails
Define specific replacement strings per PII type to satisfy audit and compliance requirements:
policy:
pii-detector:
action: redact
detect_patterns:
- '(?P<employee_id>EMP-\d{6})'
- '(?P<account_number>ACCT-\d{8,12})'
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers:
SSN: "[REDACTED-SSN-FOIA]"
EMAIL: "[REDACTED-EMAIL-FOIA]"
CREDIT_CARD: "[REDACTED-PAN-PCI]"
MRN: "[REDACTED-MRN-HIPAA]"
employee_id: "[REDACTED-EMP-ID]"
account_number: "[REDACTED-ACCT]"
pack:
name: pii-detector-example-6
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
6. Combined with hipaa-phi-detector for full HIPAA compliance
Layer PII detection with the dedicated HIPAA PHI detector for comprehensive protected health information coverage:
policies:
chain:
- prompt-injection
- pii-detector
- hipaa-phi-detector
policy:
pii-detector:
action: "redact"
healthcare_mode: true
pci_mode: false
detect_patterns: []
redaction:
marker_format: "label"
include_metadata: true
preserve_length: false
custom_markers:
MRN: "[PHI-MRN]"
NPI: "[PHI-NPI]"
INSURANCE_ID: "[PHI-INSURANCE]"
hipaa-phi-detector:
action: "redact"
redaction:
marker_format: "label"
The pii-detector handles standard PII and healthcare identifiers while hipaa-phi-detector covers the full set of 18 HIPAA identifier categories including dates of service, geographic subdivisions, and biometric data.
How It Works
The PII detector processes each incoming request through the following steps:
-
Input extraction — The user message content is extracted from the request body. For multi-turn conversations, only the latest user message is scanned (previous turns are assumed to have been scanned on their original request).
-
Built-in pattern matching — The input is tested against the always-active built-in patterns for SSN, email, phone numbers, and IP addresses. Each match records the PII type, character offset, and matched value.
-
Healthcare pattern matching — If
healthcare_modeis enabled, the input is additionally tested against MRN, insurance ID, and NPI patterns. -
PCI pattern matching — If
pci_modeis enabled, the input is tested against credit card number, CVV, and cardholder name patterns. Card numbers are validated using the Luhn algorithm to reduce false positives. -
Custom pattern matching — Each regex in
detect_patternsis evaluated against the input. Named capture groups determine the PII label used in redaction markers and metadata. -
Action execution — Based on the
actionsetting:"redact": Each detected PII span is replaced according to theredactionconfiguration. Ifcustom_markersdefines a replacement for the PII type, that string is used. Otherwise, themarker_formatdetermines the replacement style. The sanitized request is forwarded to the AI provider."block": The request is rejected immediately. No data is forwarded.
-
Event emission — A decision event is sent to the Keeptrusts API containing the PII types detected, their character offsets (if
include_metadatais enabled), the action taken, and the policy version. This event powers the audit trail in the console.
Combining With Other Policies
PII detection should run after prompt injection detection (to avoid processing adversarial input) and before content filtering or disclaimer policies:
policies:
chain:
- prompt-injection # Block attacks first
- pii-detector # Redact PII from clean input
- content-filter # Apply content rules on sanitized text
- disclaimer # Append compliance disclaimers
Common combinations:
| Combination | Purpose |
|---|---|
prompt-injection → pii-detector | Block injection attacks, then redact PII from legitimate requests |
pii-detector → hipaa-phi-detector | Standard PII catch-all followed by comprehensive HIPAA PHI coverage |
pii-detector → content-filter | Redact PII then apply topic or keyword restrictions |
pii-detector → disclaimer | Redact PII then append regulatory disclaimers to the response |
prompt-injection → pii-detector → hipaa-phi-detector → disclaimer | Full healthcare compliance pipeline |
Best Practices
- Use
"redact"instead of"block"when possible. Blocking disrupts the user experience entirely. Redaction allows the request to proceed while protecting sensitive data. Reserve"block"for strict compliance requirements like PCI-DSS where no PII should reach the model. - Enable
pci_modein any application that handles payment data. Credit card numbers, CVVs, and cardholder names should never reach an AI provider. Even if you trust the provider, PCI-DSS compliance requires that card data is not transmitted to unnecessary third parties. - Use
healthcare_modetogether withhipaa-phi-detectorfor HIPAA. The PII detector's healthcare mode covers MRN, insurance ID, and NPI, but HIPAA defines 18 identifier categories. The dedicatedhipaa-phi-detectorpolicy covers the full set. - Add custom patterns for internal identifiers. Employee IDs, internal account numbers, project codes, and other organization-specific identifiers are not covered by built-in patterns. Use
detect_patternswith named capture groups for clear audit metadata. - Set
include_metadata: truein production. Metadata enables audit trail reconstruction — you can see exactly what PII was detected and where. Disabling it saves minimal overhead and removes critical compliance evidence. - Use
custom_markersfor regulatory traceability. When audit reports need to distinguish between PII types (e.g.,[REDACTED-SSN-FOIA]vs[REDACTED-PAN-PCI]), custom markers provide clear categorization without requiring log analysis. - Test with representative data before deployment. Run the policy against sample inputs that reflect your actual traffic patterns to identify false positives (e.g., phone number patterns matching non-phone numeric sequences) and tune
detect_patternsaccordingly. - Keep
preserve_lengthdisabled unless required. Length preservation can leak information about the original value's format. Only enable it when downstream systems require fixed-width field validation.
For AI systems
- Canonical terms: Keeptrusts, pii-detector, action, redact, block, healthcare_mode, pci_mode, detect_patterns, redaction, marker_format, SSN, EMAIL, PHONE, MRN
- Config/command names:
policy.pii-detector,action(redact/block),healthcare_mode,pci_mode,detect_patterns,redaction.marker_format(label/asterisk/partial),redaction.custom_markers - Best next pages: HIPAA PHI Detector, DLP Filter, Healthcare Compliance
For engineers
- Prerequisites: Determine your PII categories: standard (email, phone, SSN, IP), healthcare (MRN, insurance ID, NPI), PCI (credit cards, CVV). Add custom patterns for organization-specific identifiers.
- Validation: Send requests containing known PII (test SSNs, example emails) and verify redaction markers in the forwarded request. Check event metadata for PII type and offset information.
- Key commands:
kt policy lint,kt policy test,kt events tail
For leaders
- Governance: PII detection prevents personal data from reaching third-party AI providers — a core GDPR, CCPA, and data protection requirement. Redaction happens before the request leaves your network.
- Cost: Runs locally with no external calls. Marginal CPU overhead per request. The cost of a data breach (average $4.5M per IBM 2023) vastly exceeds prevention infrastructure.
- Rollout: Start with
action: redactto sanitize traffic without disrupting users. Enablepci_modeif any user might paste payment card data. Addhealthcare_modefor health-adjacent deployments.
Next steps
- HIPAA PHI Detector — Full HIPAA Safe Harbor compliance
- DLP Filter — Secret and pattern-based data loss prevention
- Healthcare Compliance — Medical content controls
- Data Routing Policy — Route by data retention guarantees