Prevent Sensitive Data Leaks in AI Requests

Every AI request that leaves your network is a potential data leak. Customer names, medical records, financial identifiers, and proprietary code can all end up in prompts — and once they reach a provider, you lose control. Keeptrusts intercepts and sanitizes requests before they ever leave your gateway.

Use this page when

You need to prevent PII, PHI, or confidential data from reaching LLM providers in AI requests.
You are configuring layered data protection: PII detection, PHI redaction, DLP patterns, and zero-retention routing.
You want to understand the action options (redact, block, escalate) and choose the right one for your risk tolerance.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

What you'll achieve

Automatic PII detection and redaction across all AI requests in real time
PHI-specific protection with HIPAA Safe Harbor compliance
DLP pattern enforcement for custom sensitive data patterns (API keys, internal IDs, proprietary terms)
Zero-retention routing that ensures no data is stored or used for training by providers
Complete audit trail of every redaction decision for compliance evidence

Layer 1: PII detection and redaction

The pii-detector policy scans every request for personally identifiable information and redacts it before the request reaches the upstream provider.

policies:
  chain:
    - pii-detector
    - audit-logger

policy:
  pii-detector:
    action: redact
    redaction:
      marker_format: label
      include_metadata: true
    categories:
      - email
      - phone
      - ssn
      - credit_card
      - address
      - date_of_birth
      - drivers_license
      - passport

What happens at runtime:

Input: "Send the invoice to john.smith@acme.com at 555-0123"
After redaction: "Send the invoice to [EMAIL_REDACTED] at [PHONE_REDACTED]"
The upstream provider never sees the original values
The redaction event is logged with the original category and position metadata

Choosing an action

Action	Behavior	Use when
`redact`	Replace detected PII with labeled placeholders	Default — balances safety with usability
`block`	Reject the entire request with a 409 response	Strict environments where any PII is unacceptable
`escalate`	Forward the request but flag it for human review	Monitoring phase before enforcing hard blocks

Layer 2: PHI detection for healthcare

The hipaa-phi-detector policy extends PII detection with the 18 HIPAA Safe Harbor identifiers.

policies:
  chain:
    - hipaa-phi-detector
    - pii-detector
    - healthcare-compliance
    - audit-logger

policy:
  hipaa-phi-detector:
    mode: hipaa_18
    action: redact
    safe_harbor_method: true
  pii-detector:
    action: redact
    healthcare_mode: true
  healthcare-compliance: {}
  audit-logger:
    immutable: true
    retention_days: 2555
    hipaa_audit_controls: true

This configuration catches all 18 HIPAA identifier categories including names, dates, geographic data, medical record numbers, and biometric identifiers.

Layer 3: DLP filters for custom patterns

The dlp-filter policy lets you define custom patterns for data that standard PII detectors won't catch.

policies:
  chain:
    - dlp-filter
    - pii-detector
    - audit-logger

policy:
  dlp-filter:
    patterns:
      - name: internal_project_code
        regex: 'PROJECT-[A-Z]{3}-\d{4}'
        action: redact
      - name: api_key_leak
        regex: "(sk-[a-zA-Z0-9]{48}|AKIA[A-Z0-9]{16})"
        action: block
      - name: internal_ip
        regex: '10\.\d{1,3}\.\d{1,3}\.\d{1,3}'
        action: redact

Common DLP patterns:

API keys and secrets (OpenAI sk-, AWS AKIA, GitHub ghp_)
Internal hostnames and IP ranges
Project codenames and internal identifiers
Customer account numbers
Proprietary algorithm names or trade secrets

Layer 4: Zero-retention routing

Even with redaction, you may want to ensure providers cannot store or train on any request data. The data-routing-policy enforces this at the routing layer.

policies:
  chain:
  - data-routing-policy
  - pii-detector
  - audit-logger
policy:
  data-routing-policy:
    require_zero_data_retention: true
    require_no_training: true
    max_retention_days: 0
    on_no_compliant_provider: block
    log_provider_selection: true
providers:
  targets:
  - id: azure-openai-zdr
    provider: azure-openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com
    secret_key_ref:
      env: AZURE_OPENAI_KEY
  - id: openai-standard
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY

With require_zero_data_retention: true, only azure-openai-zdr will receive traffic. The standard OpenAI endpoint is automatically excluded.

Layer 5: Content extraction controls

The content-extractor policy prevents sensitive documents from being included wholesale in prompts.

Use it when applications attach PDFs, spreadsheets, or other files to AI requests:

policies:
  chain:
    - content-extractor
    - pii-detector
    - audit-logger

policy:
  content-extractor:
    max_document_size_bytes: 1048576
    allowed_mime_types:
      - text/plain
      - application/pdf
    strip_metadata: true

Quick wins

Deploy pii-detector with action: redact — immediate protection with zero application changes
Add a dlp-filter for API key patterns — catch the most dangerous leaks first
Enable data-routing-policy on your most sensitive workloads — guarantee zero retention
Review redaction events in the console Events page — understand what's being caught

Verifying protection is working

After deploying data leak prevention policies:

Send a test request with known PII through the gateway
Check the Events page — confirm the PII was detected and redacted
Inspect the upstream request — verify the provider received only redacted content
Export evidence — use Export Evidence to generate a compliance report

# Test PII redaction
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Contact john.doe@example.com at 555-0123 about SSN 123-45-6789"}
    ]
  }'

The gateway should redact all three PII elements before forwarding to the upstream provider.

For AI systems

Canonical terms: pii-detector, hipaa-phi-detector, dlp-filter, data-routing-policy, zero-retention, redaction.
Config keys: policy.pii-detector.action, policy.pii-detector.categories, policy.hipaa-phi-detector.mode, policy.data-routing-policy.require_zero_data_retention.
Redaction marker format: [EMAIL_REDACTED], [PHONE_REDACTED], [SSN_REDACTED], etc.
CLI commands: kt gateway run, kt events list --filter "policy_type=pii-detector".
Best next pages: Secure Healthcare AI, Protect Financial Data, Zero-Trust AI.

For engineers

Prerequisites: gateway running with pii-detector in the policy chain.
Choose action: redact (default, replaces PII with labeled placeholders), block (rejects entire request), or escalate (flag for review).
Enable hipaa-phi-detector with mode: hipaa_18 for healthcare workloads requiring HIPAA Safe Harbor compliance.
Add custom DLP patterns for internal identifiers, API keys, or proprietary terms.
Validate: send a request containing test PII (e.g., a fake email) and confirm redaction in the event log.

For leaders

Data leaks via AI prompts represent a novel breach vector not covered by traditional DLP tools.
Redaction happens at the gateway before data leaves your network — the provider never sees sensitive values.
Zero-retention routing ensures provider agreements against training on your data are enforced technically, not just contractually.
Every redaction decision is logged, providing evidence for breach notification assessments (no breach if data never left).

Next steps

Secure Healthcare AI — HIPAA-specific PHI protection
Protect Financial Data — financial data DLP controls
Block Prompt Injection — prevent attacks that try to extract data
Zero Retention Endpoints — reference list of ZDR-capable providers
Policy Controls Catalog — full inventory of available controls

Use this page when​

Primary audience​

What you'll achieve​

Layer 1: PII detection and redaction​

Choosing an action​

Layer 2: PHI detection for healthcare​

Layer 3: DLP filters for custom patterns​

Layer 4: Zero-retention routing​

Layer 5: Content extraction controls​

Quick wins​

Verifying protection is working​

For AI systems​

For engineers​

For leaders​

Next steps​