Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Prevent Sensitive Data Leaks in AI Requests

Every AI request that leaves your network is a potential data leak. Customer names, medical records, financial identifiers, and proprietary code can all end up in prompts — and once they reach a provider, you lose control. Keeptrusts intercepts and sanitizes requests before they ever leave your gateway.

Use this page when

  • You need to prevent PII, PHI, or confidential data from reaching LLM providers in AI requests.
  • You are configuring layered data protection: PII detection, PHI redaction, DLP patterns, and zero-retention routing.
  • You want to understand the action options (redact, block, escalate) and choose the right one for your risk tolerance.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

What you'll achieve

  • Automatic PII detection and redaction across all AI requests in real time
  • PHI-specific protection with HIPAA Safe Harbor compliance
  • DLP pattern enforcement for custom sensitive data patterns (API keys, internal IDs, proprietary terms)
  • Zero-retention routing that ensures no data is stored or used for training by providers
  • Complete audit trail of every redaction decision for compliance evidence

Layer 1: PII detection and redaction

The pii-detector policy scans every request for personally identifiable information and redacts it before the request reaches the upstream provider.

policies:
chain:
- pii-detector
- audit-logger

policy:
pii-detector:
action: redact
redaction:
marker_format: label
include_metadata: true
categories:
- email
- phone
- ssn
- credit_card
- address
- date_of_birth
- drivers_license
- passport

What happens at runtime:

  • Input: "Send the invoice to john.smith@acme.com at 555-0123"
  • After redaction: "Send the invoice to [EMAIL_REDACTED] at [PHONE_REDACTED]"
  • The upstream provider never sees the original values
  • The redaction event is logged with the original category and position metadata

Choosing an action

ActionBehaviorUse when
redactReplace detected PII with labeled placeholdersDefault — balances safety with usability
blockReject the entire request with a 409 responseStrict environments where any PII is unacceptable
escalateForward the request but flag it for human reviewMonitoring phase before enforcing hard blocks

Layer 2: PHI detection for healthcare

The hipaa-phi-detector policy extends PII detection with the 18 HIPAA Safe Harbor identifiers.

policies:
chain:
- hipaa-phi-detector
- pii-detector
- healthcare-compliance
- audit-logger

policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true
pii-detector:
action: redact
healthcare_mode: true
healthcare-compliance: {}
audit-logger:
immutable: true
retention_days: 2555
hipaa_audit_controls: true

This configuration catches all 18 HIPAA identifier categories including names, dates, geographic data, medical record numbers, and biometric identifiers.


Layer 3: DLP filters for custom patterns

The dlp-filter policy lets you define custom patterns for data that standard PII detectors won't catch.

policies:
chain:
- dlp-filter
- pii-detector
- audit-logger

policy:
dlp-filter:
patterns:
- name: internal_project_code
regex: 'PROJECT-[A-Z]{3}-\d{4}'
action: redact
- name: api_key_leak
regex: "(sk-[a-zA-Z0-9]{48}|AKIA[A-Z0-9]{16})"
action: block
- name: internal_ip
regex: '10\.\d{1,3}\.\d{1,3}\.\d{1,3}'
action: redact

Common DLP patterns:

  • API keys and secrets (OpenAI sk-, AWS AKIA, GitHub ghp_)
  • Internal hostnames and IP ranges
  • Project codenames and internal identifiers
  • Customer account numbers
  • Proprietary algorithm names or trade secrets

Layer 4: Zero-retention routing

Even with redaction, you may want to ensure providers cannot store or train on any request data. The data-routing-policy enforces this at the routing layer.

policies:
chain:
- data-routing-policy
- pii-detector
- audit-logger
policy:
data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
on_no_compliant_provider: block
log_provider_selection: true
providers:
targets:
- id: azure-openai-zdr
provider: azure-openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: openai-standard
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY

With require_zero_data_retention: true, only azure-openai-zdr will receive traffic. The standard OpenAI endpoint is automatically excluded.


Layer 5: Content extraction controls

The content-extractor policy prevents sensitive documents from being included wholesale in prompts.

Use it when applications attach PDFs, spreadsheets, or other files to AI requests:

policies:
chain:
- content-extractor
- pii-detector
- audit-logger

policy:
content-extractor:
max_document_size_bytes: 1048576
allowed_mime_types:
- text/plain
- application/pdf
strip_metadata: true

Quick wins

  1. Deploy pii-detector with action: redact — immediate protection with zero application changes
  2. Add a dlp-filter for API key patterns — catch the most dangerous leaks first
  3. Enable data-routing-policy on your most sensitive workloads — guarantee zero retention
  4. Review redaction events in the console Events page — understand what's being caught

Verifying protection is working

After deploying data leak prevention policies:

  1. Send a test request with known PII through the gateway
  2. Check the Events page — confirm the PII was detected and redacted
  3. Inspect the upstream request — verify the provider received only redacted content
  4. Export evidence — use Export Evidence to generate a compliance report
# Test PII redaction
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Contact john.doe@example.com at 555-0123 about SSN 123-45-6789"}
]
}'

The gateway should redact all three PII elements before forwarding to the upstream provider.


For AI systems

  • Canonical terms: pii-detector, hipaa-phi-detector, dlp-filter, data-routing-policy, zero-retention, redaction.
  • Config keys: policy.pii-detector.action, policy.pii-detector.categories, policy.hipaa-phi-detector.mode, policy.data-routing-policy.require_zero_data_retention.
  • Redaction marker format: [EMAIL_REDACTED], [PHONE_REDACTED], [SSN_REDACTED], etc.
  • CLI commands: kt gateway run, kt events list --filter "policy_type=pii-detector".
  • Best next pages: Secure Healthcare AI, Protect Financial Data, Zero-Trust AI.

For engineers

  • Prerequisites: gateway running with pii-detector in the policy chain.
  • Choose action: redact (default, replaces PII with labeled placeholders), block (rejects entire request), or escalate (flag for review).
  • Enable hipaa-phi-detector with mode: hipaa_18 for healthcare workloads requiring HIPAA Safe Harbor compliance.
  • Add custom DLP patterns for internal identifiers, API keys, or proprietary terms.
  • Validate: send a request containing test PII (e.g., a fake email) and confirm redaction in the event log.

For leaders

  • Data leaks via AI prompts represent a novel breach vector not covered by traditional DLP tools.
  • Redaction happens at the gateway before data leaves your network — the provider never sees sensitive values.
  • Zero-retention routing ensures provider agreements against training on your data are enforced technically, not just contractually.
  • Every redaction decision is logged, providing evidence for breach notification assessments (no breach if data never left).

Next steps