Prevent Sensitive Data Leaks in AI Requests
Every AI request that leaves your network is a potential data leak. Customer names, medical records, financial identifiers, and proprietary code can all end up in prompts — and once they reach a provider, you lose control. Keeptrusts intercepts and sanitizes requests before they ever leave your gateway.
Use this page when
- You need to prevent PII, PHI, or confidential data from reaching LLM providers in AI requests.
- You are configuring layered data protection: PII detection, PHI redaction, DLP patterns, and zero-retention routing.
- You want to understand the action options (redact, block, escalate) and choose the right one for your risk tolerance.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
What you'll achieve
- Automatic PII detection and redaction across all AI requests in real time
- PHI-specific protection with HIPAA Safe Harbor compliance
- DLP pattern enforcement for custom sensitive data patterns (API keys, internal IDs, proprietary terms)
- Zero-retention routing that ensures no data is stored or used for training by providers
- Complete audit trail of every redaction decision for compliance evidence
Layer 1: PII detection and redaction
The pii-detector policy scans every request for personally identifiable information and redacts it before the request reaches the upstream provider.
policies:
chain:
- pii-detector
- audit-logger
policy:
pii-detector:
action: redact
redaction:
marker_format: label
include_metadata: true
categories:
- email
- phone
- ssn
- credit_card
- address
- date_of_birth
- drivers_license
- passport
What happens at runtime:
- Input:
"Send the invoice to john.smith@acme.com at 555-0123" - After redaction:
"Send the invoice to [EMAIL_REDACTED] at [PHONE_REDACTED]" - The upstream provider never sees the original values
- The redaction event is logged with the original category and position metadata
Choosing an action
| Action | Behavior | Use when |
|---|---|---|
redact | Replace detected PII with labeled placeholders | Default — balances safety with usability |
block | Reject the entire request with a 409 response | Strict environments where any PII is unacceptable |
escalate | Forward the request but flag it for human review | Monitoring phase before enforcing hard blocks |
Layer 2: PHI detection for healthcare
The hipaa-phi-detector policy extends PII detection with the 18 HIPAA Safe Harbor identifiers.
policies:
chain:
- hipaa-phi-detector
- pii-detector
- healthcare-compliance
- audit-logger
policy:
hipaa-phi-detector:
mode: hipaa_18
action: redact
safe_harbor_method: true
pii-detector:
action: redact
healthcare_mode: true
healthcare-compliance: {}
audit-logger:
immutable: true
retention_days: 2555
hipaa_audit_controls: true
This configuration catches all 18 HIPAA identifier categories including names, dates, geographic data, medical record numbers, and biometric identifiers.
Layer 3: DLP filters for custom patterns
The dlp-filter policy lets you define custom patterns for data that standard PII detectors won't catch.
policies:
chain:
- dlp-filter
- pii-detector
- audit-logger
policy:
dlp-filter:
patterns:
- name: internal_project_code
regex: 'PROJECT-[A-Z]{3}-\d{4}'
action: redact
- name: api_key_leak
regex: "(sk-[a-zA-Z0-9]{48}|AKIA[A-Z0-9]{16})"
action: block
- name: internal_ip
regex: '10\.\d{1,3}\.\d{1,3}\.\d{1,3}'
action: redact
Common DLP patterns:
- API keys and secrets (OpenAI
sk-, AWSAKIA, GitHubghp_) - Internal hostnames and IP ranges
- Project codenames and internal identifiers
- Customer account numbers
- Proprietary algorithm names or trade secrets
Layer 4: Zero-retention routing
Even with redaction, you may want to ensure providers cannot store or train on any request data. The data-routing-policy enforces this at the routing layer.
policies:
chain:
- data-routing-policy
- pii-detector
- audit-logger
policy:
data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
on_no_compliant_provider: block
log_provider_selection: true
providers:
targets:
- id: azure-openai-zdr
provider: azure-openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: openai-standard
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
With require_zero_data_retention: true, only azure-openai-zdr will receive traffic. The standard OpenAI endpoint is automatically excluded.
Layer 5: Content extraction controls
The content-extractor policy prevents sensitive documents from being included wholesale in prompts.
Use it when applications attach PDFs, spreadsheets, or other files to AI requests:
policies:
chain:
- content-extractor
- pii-detector
- audit-logger
policy:
content-extractor:
max_document_size_bytes: 1048576
allowed_mime_types:
- text/plain
- application/pdf
strip_metadata: true
Quick wins
- Deploy
pii-detectorwithaction: redact— immediate protection with zero application changes - Add a
dlp-filterfor API key patterns — catch the most dangerous leaks first - Enable
data-routing-policyon your most sensitive workloads — guarantee zero retention - Review redaction events in the console Events page — understand what's being caught
Verifying protection is working
After deploying data leak prevention policies:
- Send a test request with known PII through the gateway
- Check the Events page — confirm the PII was detected and redacted
- Inspect the upstream request — verify the provider received only redacted content
- Export evidence — use Export Evidence to generate a compliance report
# Test PII redaction
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Contact john.doe@example.com at 555-0123 about SSN 123-45-6789"}
]
}'
The gateway should redact all three PII elements before forwarding to the upstream provider.
For AI systems
- Canonical terms: pii-detector, hipaa-phi-detector, dlp-filter, data-routing-policy, zero-retention, redaction.
- Config keys:
policy.pii-detector.action,policy.pii-detector.categories,policy.hipaa-phi-detector.mode,policy.data-routing-policy.require_zero_data_retention. - Redaction marker format:
[EMAIL_REDACTED],[PHONE_REDACTED],[SSN_REDACTED], etc. - CLI commands:
kt gateway run,kt events list --filter "policy_type=pii-detector". - Best next pages: Secure Healthcare AI, Protect Financial Data, Zero-Trust AI.
For engineers
- Prerequisites: gateway running with
pii-detectorin the policy chain. - Choose action:
redact(default, replaces PII with labeled placeholders),block(rejects entire request), orescalate(flag for review). - Enable
hipaa-phi-detectorwithmode: hipaa_18for healthcare workloads requiring HIPAA Safe Harbor compliance. - Add custom DLP patterns for internal identifiers, API keys, or proprietary terms.
- Validate: send a request containing test PII (e.g., a fake email) and confirm redaction in the event log.
For leaders
- Data leaks via AI prompts represent a novel breach vector not covered by traditional DLP tools.
- Redaction happens at the gateway before data leaves your network — the provider never sees sensitive values.
- Zero-retention routing ensures provider agreements against training on your data are enforced technically, not just contractually.
- Every redaction decision is logged, providing evidence for breach notification assessments (no breach if data never left).
Next steps
- Secure Healthcare AI — HIPAA-specific PHI protection
- Protect Financial Data — financial data DLP controls
- Block Prompt Injection — prevent attacks that try to extract data
- Zero Retention Endpoints — reference list of ZDR-capable providers
- Policy Controls Catalog — full inventory of available controls