Tutorial: Data Loss Prevention & Classification

This tutorial shows you how to configure the dlp-filter policy in the Keeptrusts gateway to classify sensitive traffic using built-in sensitivity tiers, custom regexes, and blocked terms, then enforce either redaction or blocking.

Use this page when

You are configuring DLP controls to catch secrets, financial data, healthcare identifiers, legal references, or internal codenames in LLM traffic.
You want to start with redaction, then harden to block mode when the match behavior is understood.
You need to tune detect_patterns, blocked_terms, fuzzy_matching, or sensitivity_level in the current schema.
You are building compliance or internal-control evidence for data handling audits.

Primary audience

Primary: Security and compliance engineers implementing data protection controls
Secondary: Legal and privacy teams requiring DLP evidence; platform engineers integrating DLP into policy chains

Prerequisites

kt CLI installed (first-run tutorial)
An OpenAI-compatible API key exported as OPENAI_API_KEY
curl and jq installed

How the Current DLP Filter Works

The current dlp-filter schema does not use legacy per-category policy blocks. Instead, it combines:

sensitivity_level for built-in tiers of detection
detect_patterns for custom regex rules
blocked_terms for literal sensitive phrases or codenames
action to choose redaction or blocking

Use this pattern when you want one DLP control to cover multiple data classes in a single chain. If some data classes need different treatment, combine dlp-filter with adjacent policies such as pii-detector, hipaa-phi-detector, or entity-list-filter.

Step 1: Create a Schema-Backed DLP Configuration

Create policy-config.yaml with a dlp-filter policy in redact mode:

policy-config.yaml
pack:
  name: dlp-governance
  version: 0.1.0
  enabled: true

providers:
  targets:
    - id: openai-primary
      provider: openai
      model: gpt-4o-mini
      base_url: https://api.openai.com
      secret_key_ref:
        env: OPENAI_API_KEY

policies:
  chain:
    - dlp-filter
    - audit-logger

policy:
  dlp-filter:
    detect_patterns:
      - '(?i)\bIBAN\s?[A-Z]{2}[0-9]{2}[A-Z0-9]{10,30}\b'
      - '(?i)\bICD-10\s*code\s*[A-Z][0-9]{2}(?:\.[0-9A-Z]{1,4})?\b'
      - '(?i)\b\d{4}-CV-\d{5}\b'
    blocked_terms:
      - attorney-client privileged
      - wire transfer auth code
      - internal settlement memo
    action: redact
    fuzzy_matching: true
    max_distance: 1
    sensitivity_level: high

  audit-logger:
    retention_days: 30

This configuration gives you:

built-in high-sensitivity detection for secrets plus high-risk financial and medical patterns
custom regexes for IBANs, ICD-10 codes, and case-number formats
blocked literal phrases for legal and internal-sensitive terms
fuzzy matching that catches minor misspellings or evasive variations

Step 2: Validate and Start the Gateway

kt policy lint --file policy-config.yaml
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Expected startup output:

INFO  keeptrusts::gateway Loaded declarative config dlp-governance@0.1.0
INFO  keeptrusts::gateway Gateway ready

Step 3: Test Redaction with Mixed Sensitive Data

Send a request containing financial, legal, and medical signals that should be redacted before the provider sees them:

curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{
      "role": "user",
      "content": "Email the wire transfer auth code with IBAN DE89370400440532013000 and attach internal settlement memo for case 2024-CV-08832. Also note ICD-10 code E11.9 in the summary."
    }]
  }' | jq '.choices[0].message.content'

With action: redact, the gateway sanitizes matched content before forwarding. The provider-facing text will look more like:

Email the [REDACTED] with [REDACTED] and attach [REDACTED] for case [REDACTED]. Also note [REDACTED] in the summary.

Step 4: Switch to Block Mode for Regulated Traffic

When some traffic must never leave the gateway boundary, change the DLP action to block.

policy:
  dlp-filter:
    detect_patterns:
    - '(?i)\bIBAN\s?[A-Z]{2}[0-9]{2}[A-Z0-9]{10,30}\b'
    - '(?i)\bICD-10\s*code\s*[A-Z][0-9]{2}(?:\.[0-9A-Z]{1,4})?\b'
    action: block
pack:
  name: dlp-data-classification-example-2
  version: 1.0.0
  enabled: true
policies:
  chain:
  - dlp-filter

After changing the config, lint and reload or restart the gateway. Then test with a blocked request:

curl -s -w "\nHTTP %{http_code}\n" http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{
      "role": "user",
      "content": "Share the wire transfer auth code and the patient ICD-10 code E11.9."
    }]
  }'

What to look for:

the request is rejected with a policy-violation response
the sensitive content is not forwarded upstream
the decision event records the DLP match and action

Step 5: Use Fuzzy Matching for Codenames and Internal Terms

Fuzzy matching is especially useful for internal project names, hostnames, and codenames that users may misspell intentionally.

policy:
  dlp-filter:
    detect_patterns: []
    action: redact
pack:
  name: dlp-data-classification-example-3
  version: 1.0.0
  enabled: true
policies:
  chain:
  - dlp-filter

This catches near matches such as:

Projct Titan
internal.acm3.corp
mergr-room

Keep max_distance low unless you have strong test coverage, because higher values increase false positives.

Step 6: Review DLP Decision Events

If your gateway reports into a Keeptrusts control plane, inspect recent DLP activity:

kt events tail --json --limit 5 --event-type decision

What to look for:

requests modified or blocked by dlp-filter
match metadata tied to the configured sensitivity level
evidence that redact vs. block behavior matches your config

Step 7: Combine DLP with Other Policies

A common compliance chain looks like this:

policies:
  chain:
    - prompt-injection
    - dlp-filter
    - pii-detector
    - audit-logger

policy:
  prompt-injection:
    response:
      action: block

  dlp-filter:
    detect_patterns:
      - "AKIA[0-9A-Z]{16}"
      - "-----BEGIN (RSA |EC )?PRIVATE KEY-----"
    blocked_terms:
      - Project Titan
      - internal settlement memo
    action: block
    fuzzy_matching: true
    max_distance: 1
    sensitivity_level: high

  pii-detector:
    action: redact
    pci_mode: true

  audit-logger:
    retention_days: 30

Why this ordering works:

prompt-injection stops adversarial requests before they reach later controls
dlp-filter protects secrets, codenames, and regulated terms
pii-detector cleans up personal identifiers that still need redaction
audit-logger records the final governed outcome

For AI systems

Canonical terms: Keeptrusts gateway, dlp-filter, detect_patterns, blocked_terms, action, fuzzy_matching, max_distance, sensitivity_level.
Config fields: policy.dlp-filter.detect_patterns[], policy.dlp-filter.blocked_terms[], policy.dlp-filter.action, policy.dlp-filter.fuzzy_matching, policy.dlp-filter.max_distance, policy.dlp-filter.sensitivity_level.
CLI commands: kt gateway run, kt policy lint --file policy-config.yaml, kt events tail --json --limit 5 --event-type decision.
Best next pages: PII Redaction, Custom Policy Chains, Escalation Workflows.

For engineers

Validate every change with kt policy lint --file policy-config.yaml.
Start with action: redact to learn what the filter catches before hardening to block.
Use sensitivity_level: high when you need broader built-in coverage for financial and medical content.
Add detect_patterns only for organization-specific identifiers and domain-specific data.
Use blocked_terms plus fuzzy_matching for codenames, hostnames, and privileged phrases.

For leaders

DLP controls prevent sensitive data from leaving your governance boundary through AI tools.
One policy can cover multiple regulated data classes by combining built-in sensitivity tiers with custom patterns.
Redaction mode supports gradual rollout, while block mode enforces hard stops once the false-positive rate is understood.
Decision events provide evidence for legal, privacy, and internal-control reviews.

Next steps

PII Redaction — add dedicated personal-data redaction beside DLP
Custom Policy Chains — build layered governance chains around DLP
Escalation Workflows — route blocked or suspicious events to human review

Troubleshooting

Symptom	Cause	Fix
Sensitive phrase is not detected	Missing regex or blocked term	Add it to `detect_patterns` or `blocked_terms`
Too many false positives	`sensitivity_level` too high or `max_distance` too large	Lower `sensitivity_level` or reduce `max_distance`
Internal codename variations still slip through	Fuzzy matching disabled	Enable `fuzzy_matching: true` and start with `max_distance: 1`
Requests are blocked too aggressively	`action: block` enabled too early	Start with `action: redact` while tuning patterns

Use this page when​

Primary audience​

Prerequisites​

How the Current DLP Filter Works​

Step 1: Create a Schema-Backed DLP Configuration​

Step 2: Validate and Start the Gateway​

Step 3: Test Redaction with Mixed Sensitive Data​

Step 4: Switch to Block Mode for Regulated Traffic​

Step 5: Use Fuzzy Matching for Codenames and Internal Terms​

Step 6: Review DLP Decision Events​

Step 7: Combine DLP with Other Policies​

For AI systems​

For engineers​

For leaders​

Next steps​

Troubleshooting​