DLP Filter
The dlp-filter policy applies data loss prevention pattern matching to prevent sensitive data from being sent to AI providers. It combines built-in detection rules with user-defined regex patterns and blocked terms, and supports fuzzy matching to catch evasion attempts such as misspellings and character substitution.
Use this page when
- You need to prevent secrets, API keys, private keys, or internal codenames from reaching AI providers.
- You are configuring data loss prevention with custom regex patterns, blocked terms, or fuzzy matching.
- You want tiered sensitivity levels (standard, high, restricted) for different deployment environments.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Configuration
policy:
dlp-filter:
detect_patterns:
- AKIA[0-9A-Z]{16}
- "-----BEGIN (RSA |EC )?PRIVATE KEY-----"
- ghp_[0-9a-zA-Z]{36}
action: redact
pack:
name: dlp-filter-example-1
version: 1.0.0
enabled: true
policies:
chain:
- dlp-filter
Fields
| Field | Type | Description | Default |
|---|---|---|---|
detect_patterns | string[] | Custom regex patterns for DLP detection beyond built-in rules. Each entry is a regular expression evaluated against request and response content. | [] |
blocked_terms | string[] | Exact terms to block. Matching is case-insensitive. Use this for codenames, internal hostnames, or any literal string that must never appear in AI traffic. | [] |
action | enum | Action taken when a match is detected. "redact" replaces matched content with a placeholder and forwards the request. "block" rejects the entire request immediately. | "redact" |
fuzzy_matching | boolean | Enable Levenshtein-distance fuzzy matching to catch misspellings, character substitution, and other evasion attempts against blocked_terms. | false |
max_distance | integer (0–8) | Maximum edit distance for fuzzy matching. 1 catches single-character typos; higher values catch more variations but increase false positives. Only applies when fuzzy_matching is true. | 1 |
sensitivity_level | enum | Detection sensitivity tier. "standard" catches common patterns (API keys, tokens, private keys). "high" adds financial and medical patterns (credit cards, SSNs, IBANs, MRNs). "restricted" adds all known sensitive data patterns including government identifiers, classified markings, and biometric data formats. | "standard" |
Use Cases
Secret Leakage Prevention
Block AWS credentials, GitHub tokens, and private keys from reaching AI providers.
pack:
name: "secret-leak-guard"
version: "1.0.0"
enabled: true
policies:
chain:
- dlp-filter
- audit-logger
policy:
dlp-filter:
detect_patterns:
- "AKIA[0-9A-Z]{16}"
- "[0-9a-zA-Z/+]{40}"
- "-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----"
- "ghp_[0-9a-zA-Z]{36}"
- "sk-[a-zA-Z0-9]{48}"
- "xoxb-[0-9]{10,13}-[0-9a-zA-Z]{24}"
action: "block"
sensitivity_level: "standard"
Corporate Data Protection
Redact internal project codenames, hostnames, and employee identifiers with fuzzy matching to catch deliberate misspellings.
pack:
name: "corporate-data-protection"
version: "1.0.0"
enabled: true
policies:
chain:
- dlp-filter
- prompt-injection
- audit-logger
policy:
dlp-filter:
blocked_terms:
- "Project Titan"
- "Project Aurora"
- "internal.acme.corp"
- "staging.acme.corp"
- "jira.acme.corp"
action: "redact"
fuzzy_matching: true
max_distance: 2
sensitivity_level: "standard"
Financial Data Protection
Enable high sensitivity to automatically detect credit card numbers, bank account numbers, and other financial identifiers alongside custom patterns.
pack:
name: "financial-dlp"
version: "1.0.0"
enabled: true
policies:
chain:
- dlp-filter
- pii-filter
- audit-logger
policy:
dlp-filter:
detect_patterns:
- '\b[0-9]{9,18}\b'
- '\bIBAN\s?[A-Z]{2}[0-9]{2}\b'
blocked_terms:
- "SWIFT"
- "wire transfer auth code"
action: "block"
sensitivity_level: "high"
Restricted Mode for Classified Environments
Use the restricted sensitivity level for maximum detection coverage, combined with fuzzy matching at a higher edit distance for evasion prevention.
pack:
name: "classified-dlp"
version: "1.0.0"
enabled: true
policies:
chain:
- dlp-filter
- entity-list-filter
- itar-ear-filter
- audit-logger
policy:
dlp-filter:
detect_patterns:
- "TS//SCI"
- "NOFORN"
- "REL TO"
- '\bSAP\s+[A-Z]{2,}\b'
blocked_terms:
- "COSMIC TOP SECRET"
- "NATO RESTRICTED"
action: "block"
fuzzy_matching: true
max_distance: 3
sensitivity_level: "restricted"
How It Works
- Content extraction — The filter reads the full request body (prompt, system message, and any tool-call arguments) and, when applied on the response path, the model output.
- Built-in detection — Based on
sensitivity_level, a set of pre-compiled regex patterns is activated.standardcovers API keys, tokens, and private keys.highadds credit card numbers (Luhn-validated), SSNs, IBANs, and medical record numbers.restrictedadds government classification markings, biometric data formats, and additional national identifiers. - Custom pattern matching — Each regex in
detect_patternsis evaluated against the content. Patterns are compiled once at policy load time and applied per-request. - Blocked term matching — Each entry in
blocked_termsis matched case-insensitively as a substring. Whenfuzzy_matchingis enabled, the filter also computes the Levenshtein distance between each token in the content and each blocked term; any token withinmax_distanceedits is treated as a match. - Action enforcement — If any match is found, the configured
actionis applied.redactreplaces matched spans with[REDACTED]and forwards the modified content.blockrejects the request with aPOLICY_VIOLATIONerror and returns the list of matched pattern names. - Event emission — Every match (redacted or blocked) emits a structured decision event to the control-plane API with the matched pattern identifiers, sensitivity level, and action taken.
Combining With Other Policies
| Combination | Effect |
|---|---|
dlp-filter → pii-filter | DLP catches secrets and corporate data; PII catches personal identifiers (names, emails, phone numbers). Use both for comprehensive data protection. |
dlp-filter → prompt-injection | DLP runs first to strip sensitive data, then prompt-injection detection runs on the sanitized content. |
dlp-filter → itar-ear-filter | DLP catches data patterns; ITAR/EAR catches export-controlled technical data by topic classification. Layer both in defense and aerospace deployments. |
dlp-filter → audit-logger | Always place audit-logger last in the chain to capture the final verdict and any redactions applied. |
Best Practices
- Start with
redact, move toblock— Begin withaction: "redact"in monitoring mode to understand match frequency and false positive rates before switching toblock. - Use
sensitivity_levelbefore custom patterns — The built-in tiers cover common sensitive data formats. Only adddetect_patternsfor organization-specific secrets and identifiers. - Keep
max_distancelow — Amax_distanceof 1–2 catches common typos. Values above 3 significantly increase false positives, especially for short blocked terms. - Avoid overly broad regex — Patterns like
[0-9]+will match any number. Be specific: anchor patterns with word boundaries (\b) or fixed prefixes. - Combine
blocked_termswithfuzzy_matchingfor codenames — Users may intentionally misspell project names to bypass filters. Fuzzy matching at distance 2 catches "Projct Titan" and "Project Titen" without excessive noise. - Test patterns against sample traffic — Use
kt policy testwith representative prompts to verify detection coverage before deploying to production. - Layer sensitivity levels with the policy chain — Use
restrictedonly in environments that handle classified data. For general enterprise use,highprovides strong coverage without excessive blocking.
For AI systems
- Canonical terms: Keeptrusts, dlp-filter, detect_patterns, blocked_terms, action, fuzzy_matching, max_distance, sensitivity_level, redact, block
- Config/command names:
policy.dlp-filter,detect_patterns(regex),blocked_terms,action(redact/block),fuzzy_matching,max_distance,sensitivity_level(standard/high/restricted) - Best next pages: PII Detector, ITAR/EAR Filter, Embedding Detector, Prompt Injection Detection
For engineers
- Prerequisites: Know the specific patterns, secrets, or terms you need to catch. Test regex patterns against sample data before deploying.
- Validation: Run
kt policy testwith cases that include known sensitive patterns. Verify redaction markers appear in forwarded requests. Checkkt events tailfor DLP match metadata. - Key commands:
kt policy lint,kt policy test,kt events tail
For leaders
- Governance: DLP prevents accidental data exfiltration through AI tools. The sensitivity tiers map to data classification levels —
restrictedfor classified environments,standardfor general enterprise use. - Cost: DLP runs locally with no external calls. Fuzzy matching adds marginal CPU overhead proportional to the number of
blocked_termsandmax_distancesetting. - Rollout: Start with
action: redactandsensitivity_level: standard. Monitor match rates in Events. Escalate tohighorrestrictedandaction: blockbased on observed true-positive rates.
Next steps
- PII Detector — Personal data detection and redaction
- ITAR/EAR Filter — Export control term blocking
- Embedding Detector — Semantic similarity detection
- Safety Filter — Unsafe content blocking