Skip to main content
Browse docs

DLP Filter

The dlp-filter policy applies data loss prevention pattern matching to prevent sensitive data from being sent to AI providers. It combines built-in detection rules with user-defined regex patterns and blocked terms, and supports fuzzy matching to catch evasion attempts such as misspellings and character substitution.

Use this page when

  • You need to prevent secrets, API keys, private keys, or internal codenames from reaching AI providers.
  • You are configuring data loss prevention with custom regex patterns, blocked terms, or fuzzy matching.
  • You want tiered sensitivity levels (standard, high, restricted) for different deployment environments.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Configuration

policy:
dlp-filter:
detect_patterns:
- AKIA[0-9A-Z]{16}
- "-----BEGIN (RSA |EC )?PRIVATE KEY-----"
- ghp_[0-9a-zA-Z]{36}
action: redact
pack:
name: dlp-filter-example-1
version: 1.0.0
enabled: true
policies:
chain:
- dlp-filter

Fields

FieldTypeDescriptionDefault
detect_patternsstring[]Custom regex patterns for DLP detection beyond built-in rules. Each entry is a regular expression evaluated against request and response content.[]
blocked_termsstring[]Exact terms to block. Matching is case-insensitive. Use this for codenames, internal hostnames, or any literal string that must never appear in AI traffic.[]
actionenumAction taken when a match is detected. "redact" replaces matched content with a placeholder and forwards the request. "block" rejects the entire request immediately."redact"
fuzzy_matchingbooleanEnable Levenshtein-distance fuzzy matching to catch misspellings, character substitution, and other evasion attempts against blocked_terms.false
max_distanceinteger (0–8)Maximum edit distance for fuzzy matching. 1 catches single-character typos; higher values catch more variations but increase false positives. Only applies when fuzzy_matching is true.1
sensitivity_levelenumDetection sensitivity tier. "standard" catches common patterns (API keys, tokens, private keys). "high" adds financial and medical patterns (credit cards, SSNs, IBANs, MRNs). "restricted" adds all known sensitive data patterns including government identifiers, classified markings, and biometric data formats."standard"

Use Cases

Secret Leakage Prevention

Block AWS credentials, GitHub tokens, and private keys from reaching AI providers.

pack:
name: "secret-leak-guard"
version: "1.0.0"
enabled: true

policies:
chain:
- dlp-filter
- audit-logger

policy:
dlp-filter:
detect_patterns:
- "AKIA[0-9A-Z]{16}"
- "[0-9a-zA-Z/+]{40}"
- "-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----"
- "ghp_[0-9a-zA-Z]{36}"
- "sk-[a-zA-Z0-9]{48}"
- "xoxb-[0-9]{10,13}-[0-9a-zA-Z]{24}"
action: "block"
sensitivity_level: "standard"

Corporate Data Protection

Redact internal project codenames, hostnames, and employee identifiers with fuzzy matching to catch deliberate misspellings.

pack:
name: "corporate-data-protection"
version: "1.0.0"
enabled: true

policies:
chain:
- dlp-filter
- prompt-injection
- audit-logger

policy:
dlp-filter:
blocked_terms:
- "Project Titan"
- "Project Aurora"
- "internal.acme.corp"
- "staging.acme.corp"
- "jira.acme.corp"
action: "redact"
fuzzy_matching: true
max_distance: 2
sensitivity_level: "standard"

Financial Data Protection

Enable high sensitivity to automatically detect credit card numbers, bank account numbers, and other financial identifiers alongside custom patterns.

pack:
name: "financial-dlp"
version: "1.0.0"
enabled: true

policies:
chain:
- dlp-filter
- pii-filter
- audit-logger

policy:
dlp-filter:
detect_patterns:
- '\b[0-9]{9,18}\b'
- '\bIBAN\s?[A-Z]{2}[0-9]{2}\b'
blocked_terms:
- "SWIFT"
- "wire transfer auth code"
action: "block"
sensitivity_level: "high"

Restricted Mode for Classified Environments

Use the restricted sensitivity level for maximum detection coverage, combined with fuzzy matching at a higher edit distance for evasion prevention.

pack:
name: "classified-dlp"
version: "1.0.0"
enabled: true

policies:
chain:
- dlp-filter
- entity-list-filter
- itar-ear-filter
- audit-logger

policy:
dlp-filter:
detect_patterns:
- "TS//SCI"
- "NOFORN"
- "REL TO"
- '\bSAP\s+[A-Z]{2,}\b'
blocked_terms:
- "COSMIC TOP SECRET"
- "NATO RESTRICTED"
action: "block"
fuzzy_matching: true
max_distance: 3
sensitivity_level: "restricted"

How It Works

  1. Content extraction — The filter reads the full request body (prompt, system message, and any tool-call arguments) and, when applied on the response path, the model output.
  2. Built-in detection — Based on sensitivity_level, a set of pre-compiled regex patterns is activated. standard covers API keys, tokens, and private keys. high adds credit card numbers (Luhn-validated), SSNs, IBANs, and medical record numbers. restricted adds government classification markings, biometric data formats, and additional national identifiers.
  3. Custom pattern matching — Each regex in detect_patterns is evaluated against the content. Patterns are compiled once at policy load time and applied per-request.
  4. Blocked term matching — Each entry in blocked_terms is matched case-insensitively as a substring. When fuzzy_matching is enabled, the filter also computes the Levenshtein distance between each token in the content and each blocked term; any token within max_distance edits is treated as a match.
  5. Action enforcement — If any match is found, the configured action is applied. redact replaces matched spans with [REDACTED] and forwards the modified content. block rejects the request with a POLICY_VIOLATION error and returns the list of matched pattern names.
  6. Event emission — Every match (redacted or blocked) emits a structured decision event to the control-plane API with the matched pattern identifiers, sensitivity level, and action taken.

Combining With Other Policies

CombinationEffect
dlp-filterpii-filterDLP catches secrets and corporate data; PII catches personal identifiers (names, emails, phone numbers). Use both for comprehensive data protection.
dlp-filterprompt-injectionDLP runs first to strip sensitive data, then prompt-injection detection runs on the sanitized content.
dlp-filteritar-ear-filterDLP catches data patterns; ITAR/EAR catches export-controlled technical data by topic classification. Layer both in defense and aerospace deployments.
dlp-filteraudit-loggerAlways place audit-logger last in the chain to capture the final verdict and any redactions applied.

Best Practices

  • Start with redact, move to block — Begin with action: "redact" in monitoring mode to understand match frequency and false positive rates before switching to block.
  • Use sensitivity_level before custom patterns — The built-in tiers cover common sensitive data formats. Only add detect_patterns for organization-specific secrets and identifiers.
  • Keep max_distance low — A max_distance of 1–2 catches common typos. Values above 3 significantly increase false positives, especially for short blocked terms.
  • Avoid overly broad regex — Patterns like [0-9]+ will match any number. Be specific: anchor patterns with word boundaries (\b) or fixed prefixes.
  • Combine blocked_terms with fuzzy_matching for codenames — Users may intentionally misspell project names to bypass filters. Fuzzy matching at distance 2 catches "Projct Titan" and "Project Titen" without excessive noise.
  • Test patterns against sample traffic — Use kt policy test with representative prompts to verify detection coverage before deploying to production.
  • Layer sensitivity levels with the policy chain — Use restricted only in environments that handle classified data. For general enterprise use, high provides strong coverage without excessive blocking.

For AI systems

  • Canonical terms: Keeptrusts, dlp-filter, detect_patterns, blocked_terms, action, fuzzy_matching, max_distance, sensitivity_level, redact, block
  • Config/command names: policy.dlp-filter, detect_patterns (regex), blocked_terms, action (redact/block), fuzzy_matching, max_distance, sensitivity_level (standard/high/restricted)
  • Best next pages: PII Detector, ITAR/EAR Filter, Embedding Detector, Prompt Injection Detection

For engineers

  • Prerequisites: Know the specific patterns, secrets, or terms you need to catch. Test regex patterns against sample data before deploying.
  • Validation: Run kt policy test with cases that include known sensitive patterns. Verify redaction markers appear in forwarded requests. Check kt events tail for DLP match metadata.
  • Key commands: kt policy lint, kt policy test, kt events tail

For leaders

  • Governance: DLP prevents accidental data exfiltration through AI tools. The sensitivity tiers map to data classification levels — restricted for classified environments, standard for general enterprise use.
  • Cost: DLP runs locally with no external calls. Fuzzy matching adds marginal CPU overhead proportional to the number of blocked_terms and max_distance setting.
  • Rollout: Start with action: redact and sensitivity_level: standard. Monitor match rates in Events. Escalate to high or restricted and action: block based on observed true-positive rates.

Next steps