Skip to main content
Browse docs

PII Detector

The pii-detector policy detects and optionally redacts personally identifiable information (PII) in requests before they reach the AI provider. It supports standard PII categories out of the box (SSN, phone numbers, email addresses, credit cards) and can be extended with healthcare-specific identifiers, PCI-DSS patterns, and custom regex rules.

Use this page when

  • You need to detect and redact personally identifiable information before requests reach AI providers.
  • You are configuring PII detection with healthcare mode, PCI-DSS mode, or custom regex patterns.
  • You want to control redaction format (labels, asterisks, partial masking) and per-type custom markers.

Keeptrusts applies PII detection inline — the original request is never forwarded with unredacted data when redaction is enabled.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Configuration

pack:
name: "pii-protection"
version: "0.1.0"
enabled: true

policies:
chain:
- pii-detector

policy:
pii-detector:
action: "redact"
healthcare_mode: false
pci_mode: true
detect_patterns: []
redaction:
marker_format: "label"
include_metadata: true
preserve_length: false
custom_markers: {}

Fields

FieldTypeDefaultDescription
actionenum"redact"Action on PII detection. "redact" replaces detected PII with markers and forwards the sanitized request. "block" rejects the entire request.
healthcare_modebooleanfalseEnable healthcare-specific PII identifiers including Medical Record Numbers (MRN), insurance IDs, and National Provider Identifiers (NPI). For full HIPAA compliance, combine with the hipaa-phi-detector policy.
pci_modebooleantrueEnable PCI-DSS credit card, CVV, and cardholder name detection. When enabled, 16-digit card numbers (Visa, Mastercard, Amex, Discover), 3–4 digit CVVs, and cardholder name patterns adjacent to card data are detected.
detect_patternsstring[][]Custom regex patterns for additional PII categories. Each pattern is evaluated case-insensitively against the input. Use named capture groups for descriptive audit metadata (e.g., (?P<employee_id>EMP-\d{6})).
redactionobject(see below)Controls the format and metadata of redacted output.

redaction sub-object

Controls how detected PII is replaced in the forwarded request.

FieldTypeDefaultDescription
redaction.marker_formatenum"label"Replacement format for detected PII. "label" replaces with the PII type in brackets (e.g., [SSN], [EMAIL]). "asterisk" replaces with asterisks (***). "partial" masks the middle of the value while preserving the first and last characters (e.g., J***n D*e).
redaction.include_metadatabooleantrueInclude PII type and character offset metadata in the decision event sent to the Keeptrusts API. Enables audit trail reconstruction of what was redacted and where.
redaction.preserve_lengthbooleanfalseEnsure the replacement string matches the character length of the original PII value. Useful when downstream systems validate fixed-width fields. Only applies to "asterisk" and "partial" marker formats.
redaction.custom_markersobject{}Per-PII-type custom replacement strings. Keys are PII type labels (e.g., SSN, EMAIL, PHONE), values are the literal replacement text. Overrides marker_format for the specified types. Example: { "SSN": "[REDACTED-SSN]", "EMAIL": "[EMAIL REMOVED]" }.

Built-in Detection Categories

The following PII categories are always active regardless of configuration:

CategoryExamplesPII Label
Email addressesjohn@example.comEMAIL
Phone numbers+1-555-123-4567, (555) 123-4567PHONE
Social Security numbers123-45-6789SSN
IP addresses192.168.1.1, 2001:db8::1IP_ADDRESS

Healthcare Mode Categories

Enabled when healthcare_mode: true:

CategoryExamplesPII Label
Medical Record NumbersMRN-12345678, MRN 87654321MRN
Insurance IDsINS-ABC-123456INSURANCE_ID
National Provider IdentifiersNPI: 1234567890 (10-digit)NPI

PCI Mode Categories

Enabled when pci_mode: true:

CategoryExamplesPII Label
Credit card numbers4111-1111-1111-1111, 5500 0000 0000 0004CREDIT_CARD
CVV/CVC codes123, 4567 (3–4 digits near card context)CVV
Cardholder namesName patterns adjacent to card number contextCARDHOLDER

Use Cases

1. Default PII redaction (SSN, phone, email, credit card)

Standard PII protection with label-based redaction markers:

policy:
pii-detector:
action: redact
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers: {}
pack:
name: pii-detector-example-2
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector

Input: My SSN is 123-45-6789 and email is john@example.com Output: My SSN is [SSN] and email is [EMAIL]

2. Healthcare mode with MRN/NPI detection

Enable healthcare identifiers for medical applications:

policy:
pii-detector:
action: redact
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers:
MRN: "[MEDICAL-RECORD-REDACTED]"
NPI: "[PROVIDER-ID-REDACTED]"
pack:
name: pii-detector-example-3
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector

Input: Patient MRN-12345678 was seen by NPI: 1234567890 Output: Patient [MEDICAL-RECORD-REDACTED] was seen by [PROVIDER-ID-REDACTED]

3. PCI-DSS strict mode for payment processing

Block requests entirely if credit card data is detected — no redaction, no forwarding:

policy:
pii-detector:
action: block
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers: {}
pack:
name: pii-detector-example-4
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector

Any request containing a credit card number, CVV, or cardholder name pattern is rejected with an error before reaching the AI provider.

4. Asterisk redaction format for user-facing applications

Use asterisk masking when redacted output is shown to end users:

policy:
pii-detector:
action: redact
redaction:
marker_format: asterisk
include_metadata: true
preserve_length: true
custom_markers: {}
pack:
name: pii-detector-example-5
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector

Input: My SSN is 123-45-6789 Output: My SSN is *********** (length preserved)

5. Custom markers for regulatory audit trails

Define specific replacement strings per PII type to satisfy audit and compliance requirements:

policy:
pii-detector:
action: redact
detect_patterns:
- '(?P<employee_id>EMP-\d{6})'
- '(?P<account_number>ACCT-\d{8,12})'
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers:
SSN: "[REDACTED-SSN-FOIA]"
EMAIL: "[REDACTED-EMAIL-FOIA]"
CREDIT_CARD: "[REDACTED-PAN-PCI]"
MRN: "[REDACTED-MRN-HIPAA]"
employee_id: "[REDACTED-EMP-ID]"
account_number: "[REDACTED-ACCT]"
pack:
name: pii-detector-example-6
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector

6. Combined with hipaa-phi-detector for full HIPAA compliance

Layer PII detection with the dedicated HIPAA PHI detector for comprehensive protected health information coverage:

policies:
chain:
- prompt-injection
- pii-detector
- hipaa-phi-detector

policy:
pii-detector:
action: "redact"
healthcare_mode: true
pci_mode: false
detect_patterns: []
redaction:
marker_format: "label"
include_metadata: true
preserve_length: false
custom_markers:
MRN: "[PHI-MRN]"
NPI: "[PHI-NPI]"
INSURANCE_ID: "[PHI-INSURANCE]"

hipaa-phi-detector:
action: "redact"
redaction:
marker_format: "label"

The pii-detector handles standard PII and healthcare identifiers while hipaa-phi-detector covers the full set of 18 HIPAA identifier categories including dates of service, geographic subdivisions, and biometric data.

How It Works

The PII detector processes each incoming request through the following steps:

  1. Input extraction — The user message content is extracted from the request body. For multi-turn conversations, only the latest user message is scanned (previous turns are assumed to have been scanned on their original request).

  2. Built-in pattern matching — The input is tested against the always-active built-in patterns for SSN, email, phone numbers, and IP addresses. Each match records the PII type, character offset, and matched value.

  3. Healthcare pattern matching — If healthcare_mode is enabled, the input is additionally tested against MRN, insurance ID, and NPI patterns.

  4. PCI pattern matching — If pci_mode is enabled, the input is tested against credit card number, CVV, and cardholder name patterns. Card numbers are validated using the Luhn algorithm to reduce false positives.

  5. Custom pattern matching — Each regex in detect_patterns is evaluated against the input. Named capture groups determine the PII label used in redaction markers and metadata.

  6. Action execution — Based on the action setting:

    • "redact": Each detected PII span is replaced according to the redaction configuration. If custom_markers defines a replacement for the PII type, that string is used. Otherwise, the marker_format determines the replacement style. The sanitized request is forwarded to the AI provider.
    • "block": The request is rejected immediately. No data is forwarded.
  7. Event emission — A decision event is sent to the Keeptrusts API containing the PII types detected, their character offsets (if include_metadata is enabled), the action taken, and the policy version. This event powers the audit trail in the console.

Combining With Other Policies

PII detection should run after prompt injection detection (to avoid processing adversarial input) and before content filtering or disclaimer policies:

policies:
chain:
- prompt-injection # Block attacks first
- pii-detector # Redact PII from clean input
- content-filter # Apply content rules on sanitized text
- disclaimer # Append compliance disclaimers

Common combinations:

CombinationPurpose
prompt-injectionpii-detectorBlock injection attacks, then redact PII from legitimate requests
pii-detectorhipaa-phi-detectorStandard PII catch-all followed by comprehensive HIPAA PHI coverage
pii-detectorcontent-filterRedact PII then apply topic or keyword restrictions
pii-detectordisclaimerRedact PII then append regulatory disclaimers to the response
prompt-injectionpii-detectorhipaa-phi-detectordisclaimerFull healthcare compliance pipeline

Best Practices

  • Use "redact" instead of "block" when possible. Blocking disrupts the user experience entirely. Redaction allows the request to proceed while protecting sensitive data. Reserve "block" for strict compliance requirements like PCI-DSS where no PII should reach the model.
  • Enable pci_mode in any application that handles payment data. Credit card numbers, CVVs, and cardholder names should never reach an AI provider. Even if you trust the provider, PCI-DSS compliance requires that card data is not transmitted to unnecessary third parties.
  • Use healthcare_mode together with hipaa-phi-detector for HIPAA. The PII detector's healthcare mode covers MRN, insurance ID, and NPI, but HIPAA defines 18 identifier categories. The dedicated hipaa-phi-detector policy covers the full set.
  • Add custom patterns for internal identifiers. Employee IDs, internal account numbers, project codes, and other organization-specific identifiers are not covered by built-in patterns. Use detect_patterns with named capture groups for clear audit metadata.
  • Set include_metadata: true in production. Metadata enables audit trail reconstruction — you can see exactly what PII was detected and where. Disabling it saves minimal overhead and removes critical compliance evidence.
  • Use custom_markers for regulatory traceability. When audit reports need to distinguish between PII types (e.g., [REDACTED-SSN-FOIA] vs [REDACTED-PAN-PCI]), custom markers provide clear categorization without requiring log analysis.
  • Test with representative data before deployment. Run the policy against sample inputs that reflect your actual traffic patterns to identify false positives (e.g., phone number patterns matching non-phone numeric sequences) and tune detect_patterns accordingly.
  • Keep preserve_length disabled unless required. Length preservation can leak information about the original value's format. Only enable it when downstream systems require fixed-width field validation.

For AI systems

  • Canonical terms: Keeptrusts, pii-detector, action, redact, block, healthcare_mode, pci_mode, detect_patterns, redaction, marker_format, SSN, EMAIL, PHONE, MRN
  • Config/command names: policy.pii-detector, action (redact/block), healthcare_mode, pci_mode, detect_patterns, redaction.marker_format (label/asterisk/partial), redaction.custom_markers
  • Best next pages: HIPAA PHI Detector, DLP Filter, Healthcare Compliance

For engineers

  • Prerequisites: Determine your PII categories: standard (email, phone, SSN, IP), healthcare (MRN, insurance ID, NPI), PCI (credit cards, CVV). Add custom patterns for organization-specific identifiers.
  • Validation: Send requests containing known PII (test SSNs, example emails) and verify redaction markers in the forwarded request. Check event metadata for PII type and offset information.
  • Key commands: kt policy lint, kt policy test, kt events tail

For leaders

  • Governance: PII detection prevents personal data from reaching third-party AI providers — a core GDPR, CCPA, and data protection requirement. Redaction happens before the request leaves your network.
  • Cost: Runs locally with no external calls. Marginal CPU overhead per request. The cost of a data breach (average $4.5M per IBM 2023) vastly exceeds prevention infrastructure.
  • Rollout: Start with action: redact to sanitize traffic without disrupting users. Enable pci_mode if any user might paste payment card data. Add healthcare_mode for health-adjacent deployments.

Next steps