PII Detector

The pii-detector policy detects and optionally redacts personally identifiable information (PII) in requests before they reach the AI provider. It supports standard PII categories out of the box (SSN, phone numbers, email addresses, credit cards) and can be extended with healthcare-specific identifiers, PCI-DSS patterns, and custom regex rules.

Use this page when

You need to detect and redact personally identifiable information before requests reach AI providers.
You are configuring PII detection with healthcare mode, PCI-DSS mode, or custom regex patterns.
You want to control redaction format (labels, asterisks, partial masking) and per-type custom markers.

Keeptrusts applies PII detection inline — the original request is never forwarded with unredacted data when redaction is enabled.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Configuration

pack:
  name: "pii-protection"
  version: "0.1.0"
  enabled: true

policies:
  chain:
    - pii-detector

policy:
  pii-detector:
    action: "redact"
    healthcare_mode: false
    pci_mode: true
    detect_patterns: []
    redaction:
      marker_format: "label"
      include_metadata: true
      preserve_length: false
      custom_markers: {}

Fields

Field	Type	Default	Description
`action`	enum	`"redact"`	Action on PII detection. `"redact"` replaces detected PII with markers and forwards the sanitized request. `"block"` rejects the entire request.
`healthcare_mode`	boolean	`false`	Enable healthcare-specific PII identifiers including Medical Record Numbers (MRN), insurance IDs, and National Provider Identifiers (NPI). For full HIPAA compliance, combine with the `hipaa-phi-detector` policy.
`pci_mode`	boolean	`true`	Enable PCI-DSS credit card, CVV, and cardholder name detection. When enabled, 16-digit card numbers (Visa, Mastercard, Amex, Discover), 3–4 digit CVVs, and cardholder name patterns adjacent to card data are detected.
`detect_patterns`	string[]	`[]`	Custom regex patterns for additional PII categories. Each pattern is evaluated case-insensitively against the input. Use named capture groups for descriptive audit metadata (e.g., `(?P<employee_id>EMP-\d{6})`).
`redaction`	object	(see below)	Controls the format and metadata of redacted output.

`redaction` sub-object

Controls how detected PII is replaced in the forwarded request.

Field	Type	Default	Description
`redaction.marker_format`	enum	`"label"`	Replacement format for detected PII. `"label"` replaces with the PII type in brackets (e.g., `[SSN]`, `[EMAIL]`). `"asterisk"` replaces with asterisks (`*`). `"partial"` masks the middle of the value while preserving the first and last characters (e.g., `Jn De`).
`redaction.include_metadata`	boolean	`true`	Include PII type and character offset metadata in the decision event sent to the Keeptrusts API. Enables audit trail reconstruction of what was redacted and where.
`redaction.preserve_length`	boolean	`false`	Ensure the replacement string matches the character length of the original PII value. Useful when downstream systems validate fixed-width fields. Only applies to `"asterisk"` and `"partial"` marker formats.
`redaction.custom_markers`	object	`{}`	Per-PII-type custom replacement strings. Keys are PII type labels (e.g., `SSN`, `EMAIL`, `PHONE`), values are the literal replacement text. Overrides `marker_format` for the specified types. Example: `{ "SSN": "[REDACTED-SSN]", "EMAIL": "[EMAIL REMOVED]" }`.

Built-in Detection Categories

The following PII categories are always active regardless of configuration:

Category	Examples	PII Label
Email addresses	`john@example.com`	`EMAIL`
Phone numbers	`+1-555-123-4567`, `(555) 123-4567`	`PHONE`
Social Security numbers	`123-45-6789`	`SSN`
IP addresses	`192.168.1.1`, `2001:db8::1`	`IP_ADDRESS`

Healthcare Mode Categories

Enabled when healthcare_mode: true:

Category	Examples	PII Label
Medical Record Numbers	`MRN-12345678`, `MRN 87654321`	`MRN`
Insurance IDs	`INS-ABC-123456`	`INSURANCE_ID`
National Provider Identifiers	`NPI: 1234567890` (10-digit)	`NPI`

PCI Mode Categories

Enabled when pci_mode: true:

Category	Examples	PII Label
Credit card numbers	`4111-1111-1111-1111`, `5500 0000 0000 0004`	`CREDIT_CARD`
CVV/CVC codes	`123`, `4567` (3–4 digits near card context)	`CVV`
Cardholder names	Name patterns adjacent to card number context	`CARDHOLDER`

Use Cases

1. Default PII redaction (SSN, phone, email, credit card)

Standard PII protection with label-based redaction markers:

policy:
  pii-detector:
    action: redact
    redaction:
      marker_format: label
      include_metadata: true
      preserve_length: false
      custom_markers: {}
pack:
  name: pii-detector-example-2
  version: 1.0.0
  enabled: true
policies:
  chain:
  - pii-detector

Input: My SSN is 123-45-6789 and email is john@example.com Output: My SSN is [SSN] and email is [EMAIL]

2. Healthcare mode with MRN/NPI detection

Enable healthcare identifiers for medical applications:

policy:
  pii-detector:
    action: redact
    redaction:
      marker_format: label
      include_metadata: true
      preserve_length: false
      custom_markers:
        MRN: "[MEDICAL-RECORD-REDACTED]"
        NPI: "[PROVIDER-ID-REDACTED]"
pack:
  name: pii-detector-example-3
  version: 1.0.0
  enabled: true
policies:
  chain:
  - pii-detector

Input: Patient MRN-12345678 was seen by NPI: 1234567890 Output: Patient [MEDICAL-RECORD-REDACTED] was seen by [PROVIDER-ID-REDACTED]

3. PCI-DSS strict mode for payment processing

Block requests entirely if credit card data is detected — no redaction, no forwarding:

policy:
  pii-detector:
    action: block
    redaction:
      marker_format: label
      include_metadata: true
      preserve_length: false
      custom_markers: {}
pack:
  name: pii-detector-example-4
  version: 1.0.0
  enabled: true
policies:
  chain:
  - pii-detector

Any request containing a credit card number, CVV, or cardholder name pattern is rejected with an error before reaching the AI provider.

4. Asterisk redaction format for user-facing applications

Use asterisk masking when redacted output is shown to end users:

policy:
  pii-detector:
    action: redact
    redaction:
      marker_format: asterisk
      include_metadata: true
      preserve_length: true
      custom_markers: {}
pack:
  name: pii-detector-example-5
  version: 1.0.0
  enabled: true
policies:
  chain:
  - pii-detector

Input: My SSN is 123-45-6789 Output: My SSN is *********** (length preserved)

5. Custom markers for regulatory audit trails

Define specific replacement strings per PII type to satisfy audit and compliance requirements:

policy:
  pii-detector:
    action: redact
    detect_patterns:
    - '(?P<employee_id>EMP-\d{6})'
    - '(?P<account_number>ACCT-\d{8,12})'
    redaction:
      marker_format: label
      include_metadata: true
      preserve_length: false
      custom_markers:
        SSN: "[REDACTED-SSN-FOIA]"
        EMAIL: "[REDACTED-EMAIL-FOIA]"
        CREDIT_CARD: "[REDACTED-PAN-PCI]"
        MRN: "[REDACTED-MRN-HIPAA]"
        employee_id: "[REDACTED-EMP-ID]"
        account_number: "[REDACTED-ACCT]"
pack:
  name: pii-detector-example-6
  version: 1.0.0
  enabled: true
policies:
  chain:
  - pii-detector

6. Combined with hipaa-phi-detector for full HIPAA compliance

Layer PII detection with the dedicated HIPAA PHI detector for comprehensive protected health information coverage:

policies:
  chain:
    - prompt-injection
    - pii-detector
    - hipaa-phi-detector

policy:
  pii-detector:
    action: "redact"
    healthcare_mode: true
    pci_mode: false
    detect_patterns: []
    redaction:
      marker_format: "label"
      include_metadata: true
      preserve_length: false
      custom_markers:
        MRN: "[PHI-MRN]"
        NPI: "[PHI-NPI]"
        INSURANCE_ID: "[PHI-INSURANCE]"

  hipaa-phi-detector:
    action: "redact"
    redaction:
      marker_format: "label"

The pii-detector handles standard PII and healthcare identifiers while hipaa-phi-detector covers the full set of 18 HIPAA identifier categories including dates of service, geographic subdivisions, and biometric data.

How It Works

The PII detector processes each incoming request through the following steps:

Input extraction — The user message content is extracted from the request body. For multi-turn conversations, only the latest user message is scanned (previous turns are assumed to have been scanned on their original request).
Built-in pattern matching — The input is tested against the always-active built-in patterns for SSN, email, phone numbers, and IP addresses. Each match records the PII type, character offset, and matched value.
Healthcare pattern matching — If healthcare_mode is enabled, the input is additionally tested against MRN, insurance ID, and NPI patterns.
PCI pattern matching — If pci_mode is enabled, the input is tested against credit card number, CVV, and cardholder name patterns. Card numbers are validated using the Luhn algorithm to reduce false positives.
Custom pattern matching — Each regex in detect_patterns is evaluated against the input. Named capture groups determine the PII label used in redaction markers and metadata.
Action execution — Based on the action setting:
- "redact": Each detected PII span is replaced according to the redaction configuration. If custom_markers defines a replacement for the PII type, that string is used. Otherwise, the marker_format determines the replacement style. The sanitized request is forwarded to the AI provider.
- "block": The request is rejected immediately. No data is forwarded.
Event emission — A decision event is sent to the Keeptrusts API containing the PII types detected, their character offsets (if include_metadata is enabled), the action taken, and the policy version. This event powers the audit trail in the console.

Combining With Other Policies

PII detection should run after prompt injection detection (to avoid processing adversarial input) and before content filtering or disclaimer policies:

policies:
  chain:
    - prompt-injection      # Block attacks first
    - pii-detector          # Redact PII from clean input
    - content-filter        # Apply content rules on sanitized text
    - disclaimer            # Append compliance disclaimers

Common combinations:

Combination	Purpose
`prompt-injection` → `pii-detector`	Block injection attacks, then redact PII from legitimate requests
`pii-detector` → `hipaa-phi-detector`	Standard PII catch-all followed by comprehensive HIPAA PHI coverage
`pii-detector` → `content-filter`	Redact PII then apply topic or keyword restrictions
`pii-detector` → `disclaimer`	Redact PII then append regulatory disclaimers to the response
`prompt-injection` → `pii-detector` → `hipaa-phi-detector` → `disclaimer`	Full healthcare compliance pipeline

Best Practices

Use "redact" instead of "block" when possible. Blocking disrupts the user experience entirely. Redaction allows the request to proceed while protecting sensitive data. Reserve "block" for strict compliance requirements like PCI-DSS where no PII should reach the model.
Enable pci_mode in any application that handles payment data. Credit card numbers, CVVs, and cardholder names should never reach an AI provider. Even if you trust the provider, PCI-DSS compliance requires that card data is not transmitted to unnecessary third parties.
Use healthcare_mode together with hipaa-phi-detector for HIPAA. The PII detector's healthcare mode covers MRN, insurance ID, and NPI, but HIPAA defines 18 identifier categories. The dedicated hipaa-phi-detector policy covers the full set.
Add custom patterns for internal identifiers. Employee IDs, internal account numbers, project codes, and other organization-specific identifiers are not covered by built-in patterns. Use detect_patterns with named capture groups for clear audit metadata.
Set include_metadata: true in production. Metadata enables audit trail reconstruction — you can see exactly what PII was detected and where. Disabling it saves minimal overhead and removes critical compliance evidence.
Use custom_markers for regulatory traceability. When audit reports need to distinguish between PII types (e.g., [REDACTED-SSN-FOIA] vs [REDACTED-PAN-PCI]), custom markers provide clear categorization without requiring log analysis.
Test with representative data before deployment. Run the policy against sample inputs that reflect your actual traffic patterns to identify false positives (e.g., phone number patterns matching non-phone numeric sequences) and tune detect_patterns accordingly.
Keep preserve_length disabled unless required. Length preservation can leak information about the original value's format. Only enable it when downstream systems require fixed-width field validation.

For AI systems

Canonical terms: Keeptrusts, pii-detector, action, redact, block, healthcare_mode, pci_mode, detect_patterns, redaction, marker_format, SSN, EMAIL, PHONE, MRN
Config/command names: policy.pii-detector, action (redact/block), healthcare_mode, pci_mode, detect_patterns, redaction.marker_format (label/asterisk/partial), redaction.custom_markers
Best next pages: HIPAA PHI Detector, DLP Filter, Healthcare Compliance

For engineers

Prerequisites: Determine your PII categories: standard (email, phone, SSN, IP), healthcare (MRN, insurance ID, NPI), PCI (credit cards, CVV). Add custom patterns for organization-specific identifiers.
Validation: Send requests containing known PII (test SSNs, example emails) and verify redaction markers in the forwarded request. Check event metadata for PII type and offset information.
Key commands: kt policy lint, kt policy test, kt events tail

For leaders

Governance: PII detection prevents personal data from reaching third-party AI providers — a core GDPR, CCPA, and data protection requirement. Redaction happens before the request leaves your network.
Cost: Runs locally with no external calls. Marginal CPU overhead per request. The cost of a data breach (average $4.5M per IBM 2023) vastly exceeds prevention infrastructure.
Rollout: Start with action: redact to sanitize traffic without disrupting users. Enable pci_mode if any user might paste payment card data. Add healthcare_mode for health-adjacent deployments.

Next steps

HIPAA PHI Detector — Full HIPAA Safe Harbor compliance
DLP Filter — Secret and pattern-based data loss prevention
Healthcare Compliance — Medical content controls
Data Routing Policy — Route by data retention guarantees

Use this page when​

Primary audience​

Configuration​

Fields​

redaction sub-object​

Built-in Detection Categories​

Healthcare Mode Categories​

PCI Mode Categories​

Use Cases​

1. Default PII redaction (SSN, phone, email, credit card)​

2. Healthcare mode with MRN/NPI detection​

3. PCI-DSS strict mode for payment processing​

4. Asterisk redaction format for user-facing applications​

5. Custom markers for regulatory audit trails​

6. Combined with hipaa-phi-detector for full HIPAA compliance​

How It Works​

Combining With Other Policies​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​