Tutorial: Setting Up PII Redaction
This tutorial shows you how to configure the Keeptrusts gateway to automatically detect and redact personally identifiable information (PII) before it reaches your LLM provider.
Use this page when
- You are configuring the gateway to detect and redact PII (email, phone, SSN, credit card) from LLM traffic.
- You want to control redaction format and audit metadata using the current
pii-detectorfields. - You need to extend built-in detection with healthcare, PCI, or custom regex patterns.
- You are verifying that redaction markers replace real data before it reaches the provider.
Primary audience
- Primary: Platform engineers and privacy teams implementing data protection at the AI gateway
- Secondary: Compliance officers verifying PII handling; developers whose apps send user data through LLMs
Prerequisites
ktCLI installed (first-run tutorial)- An OpenAI-compatible API key exported as
OPENAI_API_KEY curlandjqinstalled
Step 1: Create the Policy Configuration
Create policy-config.yaml with a pii-detector policy set to redact mode:
pack:
name: pii-redaction
version: 0.1.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
base_url: https://api.openai.com
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- pii-detector
- audit-logger
policy:
pii-detector:
action: redact
healthcare_mode: false
pci_mode: true
detect_patterns: []
redaction:
marker_format: label
include_metadata: true
preserve_length: false
custom_markers: {}
audit-logger:
retention_days: 30
This configuration redacts built-in PII categories such as email, phone numbers, SSNs, IP addresses, and—because pci_mode is enabled—credit card data and related PCI markers.
Step 2: Validate and Start the Gateway
kt policy lint --file policy-config.yaml
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
Expected startup output:
INFO keeptrusts::gateway Loaded declarative config pii-redaction@0.1.0
INFO keeptrusts::gateway Gateway ready
Step 3: Test with PII-Containing Input
Open a new terminal and send a request that contains PII:
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Draft an email to John Smith at john.smith@example.com about his account 4111-1111-1111-1111. His phone is 555-867-5309 and SSN is 123-45-6789."
}
]
}' | jq '.choices[0].message.content'
Before (what the user sent)
Draft an email to John Smith at john.smith@example.com about his account
4111-1111-1111-1111. His phone is 555-867-5309 and SSN is 123-45-6789.
After (what the LLM provider received)
Draft an email to John Smith at [EMAIL] about his account
[CREDIT_CARD]. His phone is [PHONE] and SSN is [SSN].
The gateway redacted the detected PII spans before forwarding the request upstream. The same redaction engine also sanitizes matching output content before it reaches the caller.
Step 4: Inspect the Active Redaction Policy
Inspect the running config:
curl -s http://localhost:41002/keeptrusts/config | jq .
Look for:
pack.name: pii-redaction- a
pii-detectorpolicy block - the
audit-loggerpolicy in the chain
Step 5: Tune Detection Scope and Marker Style
The current pii-detector schema lets you tune behavior with explicit fields instead of legacy entities, apply_to, or sensitivity knobs.
Common tuning fields
| Field | What it changes | Typical use |
|---|---|---|
pci_mode | Enables credit card, CVV, and cardholder detection | Payment and checkout traffic |
healthcare_mode | Enables MRN, insurance ID, and NPI detection | Clinical and healthcare workloads |
detect_patterns | Adds custom regex-based identifiers | Employee IDs, customer codes, case numbers |
redaction.marker_format | Changes replacement style | label, asterisk, or partial |
Example: extend detection for healthcare and internal employee IDs.
policy:
pii-detector:
action: redact
detect_patterns:
- '(?P<employee_id>EMP-\d{6})'
redaction:
marker_format: partial
include_metadata: true
preserve_length: true
custom_markers:
MRN: "[MEDICAL-RECORD-REDACTED]"
employee_id: "[EMPLOYEE-ID]"
pack:
name: pii-redaction-setup-example-2
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
With that configuration:
- built-in healthcare identifiers are redacted inline
- PCI data is still protected
EMP-123456style identifiers are treated as redaction targets- partial masking preserves more visual context when appropriate
Step 6: Switch to Block Mode for Hard-Stop Traffic
If some traffic must never be forwarded when PII is present, switch the policy to action: block.
policy:
pii-detector:
action: block
redaction:
marker_format: label
include_metadata: true
pack:
name: pii-redaction-setup-example-3
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
When this mode is active, the gateway rejects the request with a policy-violation error instead of sanitizing and forwarding it.
Step 7: Optionally Verify Redaction in Decision Events
If your gateway reports into a Keeptrusts control plane, tail recent decision events:
kt events tail --json --limit 5 --event-type decision
Look for decision data that confirms the request was modified by pii-detector and that redaction metadata was captured when include_metadata: true is enabled.
Step 8: Test Output-Phase Redaction
The LLM might generate PII in its response. Verify output redaction works:
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Generate a sample customer support email with realistic contact details."}
]
}' | jq '.choices[0].message.content'
Any PII the model generates in its response will be replaced with redaction tokens before reaching the caller.
For AI systems
- Canonical terms: Keeptrusts gateway,
pii-detector, redaction,healthcare_mode,pci_mode,detect_patterns,redaction.marker_format,custom_markers. - Config fields:
policies.chain[],policy.pii-detector.action,policy.pii-detector.healthcare_mode,policy.pii-detector.pci_mode,policy.pii-detector.detect_patterns[],policy.pii-detector.redaction.*. - CLI commands:
kt gateway run,kt policy lint --file policy-config.yaml,kt events tail --json --limit 5 --event-type decision. - Best next pages: Prompt Injection Defense, DLP & Data Classification, Custom Policy Chains.
For engineers
- Prerequisites:
ktCLI,OPENAI_API_KEYexported,curlandjq. - Validate:
kt policy lint --file policy-config.yamlbefore starting the gateway. - Test: send a request with known PII and verify the content is replaced with labels such as
[EMAIL],[PHONE], or[SSN]. - Scope control: use
pci_mode,healthcare_mode, anddetect_patternsinstead of legacy entity and sensitivity lists. - Redaction output: choose
label,asterisk, orpartialwith theredactionblock.
For leaders
- PII redaction prevents personal data from being sent to third-party LLM providers, supporting GDPR and privacy-by-design requirements.
- Redaction (not blocking) keeps the workflow functional while removing sensitive tokens.
- Event logs record which entities were detected and redacted — useful for compliance audits.
- Can be combined with DLP policies for layered data protection across multiple data categories.
Next steps
- Block prompt injection attacks alongside PII redaction
- Set up cost tracking to monitor usage with PII policies enabled
- Tail events in real time to audit redaction decisions
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Credit card data not detected | pci_mode disabled | Set pci_mode: true |
| MRNs or NPIs not detected | healthcare_mode disabled | Set healthcare_mode: true |
| Internal IDs not redacted | Missing custom regex | Add a named pattern to detect_patterns |
| Marker style is too noisy or too opaque | redaction.marker_format not tuned | Switch between label, asterisk, or partial |
| Gateway returns 409 unexpectedly | Block policy catching benign input | Check which policy triggered via kt events tail |